Confusion Matrix in Machine Learning

Aniruddha Bhandari Last Updated : 25 Feb, 2025

9 min read

Have you expected great results from your machine learning model, only to get poor accuracy? You’ve put in the effort, so what went wrong? How can you fix it? There are many ways to assess your classification model, but the confusion matrix is one of the most reliable option. It shows how well your model performed and where it made errors, helping you improve. Beginners often find the confusion matrix confusing, but it’s actually simple and powerful. This tutorial will explain what a confusion matrix in machine learning is and how it provides a complete view of your model’s performance.

Despite its name, you’ll see that a confusion matrix is straightforward and effective. Let’s explore the confusion matrix together!

In this article, you will explore the confusion matrix formula and its significance in analyzing confusion metrics. We will delve into the role of the confusion matrix in deep learning and its applications in AI, providing a comprehensive understanding of model performance evaluations.

Learning Objectives

Learn what a confusion matrix is and understand the various terms related to it.
Learn to use a confusion matrix for multi-class classification.
Learn to implement a confusion matrix using scikit-learn in Python.

Learning the ropes in the machine learning field? These courses will get you on your way:

What is a Confusion Matrix?
Important Terms in a Confusion Matrix
Why Do We Need a Confusion Matrix?
How to Calculate Confusion Matrix for a 2-class Classification Problem?
Precision vs. Recall
Confusion Matrix Using Scikit-learn in Python
Confusion Matrix for Multi-Class Classification
Conclusion
Frequently Asked Questions

What is a Confusion Matrix?

A confusion matrix is a performance evaluation tool in machine learning, representing the accuracy of a classification model. It displays the number of true positives, true negatives, false positives, and false negatives. This matrix aids in analyzing model performance, identifying mis-classifications, and improving predictive accuracy.

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the total number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

For a binary classification problem, we would have a 2 x 2 matrix, as shown below, with 4 values:

Let’s decipher the matrix:

The target variable has two values: Positive or Negative
The columns represent the actual values of the target variable
The rows represent the predicted values of the target variable

But wait – what’s TP, FP, FN, and TN here? That’s the crucial part of a confusion matrix. Let’s understand each term below.

Important Terms in a Confusion Matrix

True Positive (TP)

The predicted value matches the actual value, or the predicted class matches the actual class.
The actual value was positive, and the model predicted a positive value.

True Negative (TN)

The predicted value matches the actual value, or the predicted class matches the actual class.
The actual value was negative, and the model predicted a negative value.

False Positive (FP) – Type I Error

The predicted value was falsely predicted.
The actual value was negative, but the model predicted a positive value.
Also known as the type I error.

False Negative (FN) – Type II Error

The predicted value was falsely predicted.
The actual value was positive, but the model predicted a negative value.
Also known as the type II error.

Let me give you an example to better understand this. Suppose we had a classification dataset with 1000 data points. We fit a classifier (say logistic regression or decision tree) on it and get the below confusion matrix:

The different values of the Confusion matrix would be as follows:

True Positive (TP) = 560, meaning the model correctly classified 560 positive class data points.
True Negative (TN) = 330, meaning the model correctly classified 330 negative class data points.
False Positive (FP) = 60, meaning the model incorrectly classified 60 negative class data points as belonging to the positive class.
False Negative (FN) = 50, meaning the model incorrectly classified 50 positive class data points as belonging to the negative class.

This turned out to be a pretty decent classifier for our dataset, considering the relatively larger number of true positive and true negative values.

Remember the Type I and Type II errors. Interviewers love to ask the difference between these two! You can prepare for all this better from our Machine learning Course Online.

Why Do We Need a Confusion Matrix?

Before we answer this question, let’s think about a hypothetical classification problem.

Let’s say you want to predict how many people are infected with a contagious virus in times before they show the symptoms and isolate them from the healthy population (ringing any bells, yet?). The two values for our target variable would be Sick and Not Sick.

Now, you must be wondering why we need a confusion matrix when we have our all-weather friend – Accuracy. Well, let’s see where classification accuracy falters.

Our dataset is an example of an imbalanced dataset. There are 947 data points for the negative class and 3 data points for the positive class. This is how we’ll calculate the accuracy:

Let’s see how our model performed:

The total outcome values are:

TP = 30, TN = 930, FP = 30, FN = 10

So, the accuracy of our model turns out to be:

96%! Not bad!

But it gives the wrong idea about the result. Think about it.

Our model is saying, “I can predict sick people 96% of the time”. However, it is doing the opposite. It predicts the people who will not get sick with 96% accuracy while the sick are spreading the virus!

Do you think this is a correct metric for our model, given the seriousness of the issue? Shouldn’t we be measuring how many positive cases we can predict correctly to arrest the spread of the contagious virus? Or maybe, out of the correct predictions, how many are positive cases to check the reliability of our model?

This is where we come across the dual concept of Precision and Recall.

How to Calculate Confusion Matrix for a 2-class Classification Problem?

To calculate the confusion matrix for a 2-class classification problem, you will need to know the following:

True positives (TP): The number of samples that were correctly predicted as positive.
True negatives (TN): The number of samples that were correctly predicted as negative.
False positives (FP): The number of samples that were incorrectly predicted as positive.
False negatives (FN): The number of samples that were incorrectly predicted as negative.

Once you have these values, you can calculate the confusion matrix using the following table:

Predicted	TRUE	FALSE
Positive	True positives (TP)	False positives (FP)
Negative	False negatives (FN)	True negatives (TN)

Here is an example of how to calculate the confusion matrix for a 2-class classification problem:

# True positives (TP)
TP = 100

# True negatives (TN)
TN = 200

# False positives (FP)
FP = 50

# False negatives (FN)
FN = 150

# Confusion matrix
confusion_matrix = [[TP, FP], [FN, TN]]

The confusion matrix can be used to calculate a variety of metrics, such as accuracy, precision, recall, and F1 score.

Precision vs. Recall

Precision tells us how many of the correctly predicted cases actually turned out to be positive.

Here’s how to calculate Precision:

This would determine whether our model is reliable or not.

Recall tells us how many of the actual positive cases we were able to predict correctly with our model.

And here’s how we can calculate Recall:

Example Confusion matrix in machine learning

We can easily calculate Precision and Recall for our model by plugging in the values into the above questions:

50% percent of the correctly predicted cases turned out to be positive cases. Whereas 75% of the positives were successfully predicted by our model. Awesome!

Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.

Precision is important in music or video recommendation systems, e-commerce websites, etc. Wrong results could lead to customer churn and be harmful to the business.

Recall is a useful metric in cases where False Negative trumps False Positive.

Recall is important in medical cases where it doesn’t matter whether we raise a false alarm, but the actual positive cases should not go undetected!

In our example, when dealing with a contagious virus, the Confusion Matrix becomes crucial. Recall, assessing the ability to capture all actual positives, emerges as a better metric. We aim to avoid mistakenly releasing an infected person into the healthy population, potentially spreading the virus. This context highlights why accuracy proves inadequate as a metric for our model’s evaluation. The Confusion Matrix, particularly focusing on recall, provides a more insightful measure in such critical scenarios

But there will be cases where there is no clear distinction between whether Precision is more important or Recall. What should we do in those cases? We combine them!

What is F1-Score?

In practice, when we try to increase the precision of our model, the recall goes down, and vice-versa. The F1-score captures both the trends in a single value:

F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to Recall.

But there is a catch here. The interpretability of the F1-score is poor. This means that we don’t know what our classifier is maximizing – precision or recall. So, we use it in combination with other evaluation metrics, giving us a complete picture of the result.

Confusion Matrix Using Scikit-learn in Python

You know the theory – now let’s put it into practice. Let’s code a confusion matrix with the Scikit-learn (sklearn) library in Python.

Sklearn has two great functions: confusion_matrix() and classification_report().

Sklearn confusion_matrix() returns the values of the Confusion matrix. The output is, however, slightly different from what we have studied so far. It takes the rows as Actual values and the columns as Predicted values. The rest of the concept remains the same.
Sklearn classification_report() outputs precision, recall, and f1-score for each target class. In addition to this, it also has some extra values: micro avg, macro avg, and weighted avg

Mirco average is the precision/recall/f1-score calculated for all the classes.

Confusion Matrix Using Scikit-learn in Python

Macro average is the average of precision/recall/f1-score.

Weighted average is just the weighted average of precision/recall/f1-score.

Confusion Matrix for Multi-Class Classification

How would a confusion matrix in machine learning work for a multi-class classification problem? Well, don’t scratch your head! We will have a look at that here.

Let’s draw a confusion matrix for a multiclass problem where we have to predict whether a person loves Facebook, Instagram, or Snapchat. The confusion matrix would be a 3 x 3 matrix like this:

The true positive, true negative, false positive, and false negative for each class would be calculated by adding the cell values as follows:

That’s it! You are ready to decipher any N x N confusion matrix!

Conclusion

The Confusion matrix is not so confusing anymore, is it?

Hope this article gave you a solid base on how to interpret and use a confusion matrix for classification algorithms in machine learning. The matrix helps in understanding where the model has gone wrong and gives guidance to correct the path and it is a powerful and commonly used tool to evaluate the performance of a classification model in machine learning.

We will soon come out with an article on the AUC-ROC curve and continue our discussion there. Until next time, don’t lose hope in your classification model; you just might be using the wrong evaluation metric!

Key Takeaways

True Positive and True Negative values mean the predicted value matches the actual value.
A Type I Error happens when the model makes an incorrect prediction, as in, the model predicted positive for an actual negative value.
A Type II Error happens when the model makes an incorrect prediction of an actual positive value as negative.

Frequently Asked Questions

Q1. How to interpret a confusion matrix?

A. A classification model’s accuracy and errors are summarized in a confusion matrix, where entries indicate true positive, true negative, false positive, and false negative cases.

Q2. What are the advantages of using Confusion matrix?

A. A thorough assessment of a classification model’s performance is given by the confusion matrix, which also helps with more in-depth analysis by providing information on true positives, true negatives, false positives, and false negatives.

Q3. What are some examples of confusion matrix applications?

A. Applications for confusion matrices can be found in many domains, such as fraud detection, sentiment analysis, medical diagnosis (determining true/false positives/negatives for illnesses), and picture recognition accuracy evaluation.

Q4. What is the confusion matrix diagram?

A. An illustration of a classification model’s performance is provided by a confusion matrix graphic. In a structured matrix format, it shows values for true positive, true negative, false positive, and false negative.

Q5. How to use the Confusion Matrix in Machine Learning?

To use a confusion matrix in machine learning in 4 steps:

Train a machine learning model. This can be done using any machine learning algorithm, such as logistic regression, decision tree, or random forest.

Make predictions on a test dataset. This is a dataset of data that the model has not been trained on.

Construct a confusion matrix. This can be done using a Python library such as Scikit-learn.

Analyze the confusion matrix. Look at the diagonal elements of the matrix to see how many instances the model predicted correctly. Look at the off-diagonal elements of the matrix to see how many instances the model predicted incorrectly.

Aniruddha Bhandari

I am on a journey to becoming a data scientist. I love to unravel trends in data, visualize it and predict the future with ML algorithms! But the most satisfying part of this journey is sharing my learnings, from the challenges that I face, with the community to make the world a better place!

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Punit Kumar

Hi Aniruddha, Thanks for writing this. Isn't the definition of FP and FN other way around? Like False Positive (FP) – The actual value was positive but we predicted a negative value Shouldn't it be reversed?

Show 3 reply

Hey Punit, Thanks for taking out the time to read the article and pointing out the mistake. Much appreciated! Thanks Aniruddha

Ryan Tabeshi

Hi Puneet, In a FP, the value was predicted to be positive, but the value actually belonged to the negative class, so I think its correct, unless I'm missing something.

Ratheehsh

Thanks bro..You simply explained it..

Arjun Badhan

Hi Aniruddha, Thanks for the article. It is indeed informative. However, I would like to highlight something in the section with heading "Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix". Do you think that we might have mixed up on the second point on False Positive and False Negative.

Show 1 reply

Hi Arjun, Glad you found it useful. And you are correct in pointing out the mix-up in the definitions. Thanks for your timely intervention🙏. Aniruddha

Sagar

Nice and well written article Aniruddha. Introduced confusion matrix very well for beginners. (But I think there is one minor issue you may want to correct if you also notice it is really an issue: Under Type1 and Type2 error definitions, I think you have to swap 2nd bullet points. These do not match with matrix you mentioned earlier.)

Hey Sagar Really glad you liked the article! I have made the relevant changes. Thanks for the feedback! Aniruddha

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Confusion Matrix in Machine Learning

Table of contents

What is a Confusion Matrix?

Important Terms in a Confusion Matrix

Why Do We Need a Confusion Matrix?

How to Calculate Confusion Matrix for a 2-class Classification Problem?

Precision vs. Recall

What is F1-Score?

Confusion Matrix Using Scikit-learn in Python

Confusion Matrix for Multi-Class Classification

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)