A confusion matrix is a statistical tool used in machine learning and data analysis to evaluate the accuracy of a predictive model. It is a table that summarizes the performance of a classification model by comparing the predicted and actual values of a set of observations. The confusion matrix is also known as an error matrix, contingency table, or classification matrix.
The confusion matrix consists of four components: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). These components represent the number of correct and incorrect predictions made by the model. Here is a simple example to illustrate the components of a confusion matrix:
Suppose a predictive model is trained to predict whether a patient has a certain disease based on their medical history. The model predicts that 100 patients have the disease, and 80 of them actually have the disease (true positive). However, 20 of them do not have the disease (false positive). The model also predicts that 900 patients do not have the disease, and 850 of them do not have the disease (true negative). However, 50 of them actually have the disease (false negative). The confusion matrix for this example would look like this:
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | TP=80 | FN=50 |
Actual Negative | FP=20 | TN=850 |
True positive (TP) represents the number of patients who actually have the disease and were correctly identified by the model as having the disease. False positive (FP) represents the number of patients who do not have the disease but were incorrectly identified by the model as having the disease. True negative (TN) represents the number of patients who do not have the disease and were correctly identified by the model as not having the disease. False negative (FN) represents the number of patients who actually have the disease but were incorrectly identified by the model as not having the disease.
From the confusion matrix, we can calculate various performance metrics of the model. Here are some commonly used metrics:
The confusion matrix can also be used to visualize the performance of the model. For example, we can plot a heat map of the matrix to show the proportion of correct and incorrect predictions.
Advertisement
Advertisement