Naive Bayes Algorithm: A Complete guide for Data Science Enthusiasts

Anshul Last Updated : 12 Dec, 2024

8 min read

In this article, you will explore the Naive Bayes algorithm in machine learning, understand a practical Naive Bayes algorithm example, learn how it is applied in data mining, and discover how to implement the Naive Bayes algorithm in Python for various classification tasks.

Naive Bayes is a simple yet powerful probabilistic machine learning model commonly used for classification tasks, especially with large datasets. It operates based on the Naive Bayes theorem, assuming that features are independent, meaning changing the value of one feature doesn’t affect another. Despite this unrealistic assumption, which gives it the name “Naive,” the model remains popular due to its fast prediction capabilities and suitability for both binary and multi-class classification problems.

Before we dive deeper into this topic we need to understand what is “Conditional probability”, what is “Bayes’ theorem” and how conditional probability help’s us in Bayes’ theorem.

This article was published as a part of the Data Science Blogathon

What is Naive Bayes Algorithm?
Naive Bayes Algorithm in Machine Learning
Conditional Probability for Naive Bayes
Bayes’ Rule
What is Naive Bayes?
- Naive Bayes Example
- Assumptions of Naive Bayes
Gaussian Naive Bayes
Endnotes
Frequently Asked Questions

What is Naive Bayes Algorithm?

The Naive Bayes algorithm is a popular and simple classification algorithm used in machine learning. It works by calculating the probability of an item belonging to a certain class based on its features.

Read this article about the Naive Bayes Classifier Explained With Practical Problems

Naive Bayes Algorithm in Machine Learning

Naive Bayes is a simple but powerful method in machine learning used for guessing categories of things. Imagine sorting emails into spam or inbox. Naive Bayes looks at each word (like a clue) and predicts how likely it is to be spam based on past emails. It assumes these words aren’t connected (not always true!), but it’s fast and works well, making it a popular choice for many tasks.sharemore_vert

Conditional Probability for Naive Bayes

Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome. Conditional probability is calculated by multiplying the probability of the preceding event by the updated probability of the succeeding, or conditional, event.

Let’s start understanding this definition with examples.

Suppose I ask you to pick a card from the deck and find the probability of getting a king given the card is clubs.

Observe carefully that here I have mentioned a condition that the card is clubs.

Now while calculating the probability my denominator will not be 52, instead, it will be 13 because the total number of cards in clubs is 13.

Since we have only one king in clubs the probability of getting a KING given the card is clubs will be 1/13 = 0.077.

Let’s take one more example,

Consider a random experiment of tossing 2 coins. The sample space here will be:

S = {HH, HT, TH, TT}

If a person is asked to find the probability of getting a tail his answer would be 3/4 = 0.75

Now suppose this same experiment is performed by another person but now we give him the condition that both the coins should have heads. This means if event A: ‘Both the coins should have heads’, has happened then the elementary outcomes {HT, TH, TT} could not have happened. Hence in this situation, the probability of getting heads on both the coins will be 1/4 = 0.25

From the above examples, we observe that the probability may change if some additional information is given to us. This is exactly the case while building any machine learning model, we need to find the output given some features.

Mathematically, the conditional probability of event A given event B has already happened is given by:

conditional probability | Naive Bayes Algorithm

Bayes’ Rule

Now we are prepared to state one of the most useful results in conditional probability: Bayes’ Rule.

Bayes’ theorem which was given by Thomas Bayes, a British Mathematician, in 1763 provides a means for calculating the probability of an event given some information.

Mathematically Bayes’ theorem can be stated as:

Basically, we are trying to find the probability of event A, given event B is true.

Here P(B) is called prior probability which means it is the probability of an event before the evidence

P(B|A) is called the posterior probability i.e., Probability of an event after the evidence is seen.

With regards to our dataset, this formula can be re-written as:

Y: class of the variable

X: dependent feature vector (of size n)

What is Naive Bayes?

Bayes’ rule provides us with the formula for the probability of Y given some feature X. In real-world problems, we hardly find any case where there is only one feature.

When the features are independent, we can extend Bayes’ rule to what is called Naive Bayes which assumes that the features are independent that means changing the value of one feature doesn’t influence the values of other variables and this is why we call this algorithm “NAIVE”

Naive Bayes can be used for various things like face recognition, weather prediction, Medical Diagnosis, News classification, Sentiment Analysis, and a lot more.

When there are multiple X variables, we simplify it by assuming that X’s are independent, so

bayes rule for multiple X | Naive Bayes Algorithm

For n number of X, the formula becomes Naive Bayes:

Which can be expressed as:

Since the denominator is constant here so we can remove it. It’s purely your choice if you want to remove it or not. Removing the denominator will help you save time and calculations.

This formula can also be understood as:

There are a whole lot of formulas mentioned here but worry not we will try to understand all this with the help of an example.

Naive Bayes Example

Let’s take a dataset to predict whether we can pet an animal or not.

Assumptions of Naive Bayes

· All the variables are independent. That is if the animal is Dog that doesn’t mean that Size will be Medium

· All the predictors have an equal effect on the outcome. That is, the animal being dog does not have more importance in deciding If we can pet him or not. All the features have equal importance.

We should try to apply the Naive Bayes Classifier formula on the above dataset however before that, we need to do some precomputations on our dataset.

We need to find P(x_i|y_j) for each x_iin X and each y_j in Y. All these calculations have been demonstrated below:

We also need the probabilities (P(y)), which are calculated in the table below. For example, P(Pet Animal = NO) = 6/14.

Now if we send our test data, suppose test = (Cow, Medium, Black)

Probability of petting an animal :

And the probability of not petting an animal:

We know P(Yes|Test)+P(No|test) = 1

So, we will normalize the result:

normalize the result | probability of not petting an animal

We see here that P(Yes|Test) > P(No|Test), so the prediction that we can pet this animal is “Yes”.

Gaussian Naive Bayes

So far, we have discussed how to predict probabilities if the predictors take up discrete values. But what if they are continuous? For this, we need to make some more assumptions regarding the distribution of each feature. The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(xi | y). Here we’ll discuss Gaussian Naïve Bayes.

Gaussian Naïve Bayes is used when we assume all the continuous variables associated with each feature to be distributed according to Gaussian Distribution. Gaussian Distribution is also called Normal distribution.

The conditional probability changes here since we have different values now. Also, the (PDF) probability density function of a normal distribution is given by:

We can use this formula to compute the probability of likelihoods if our data is continuous.

Endnotes

Naive Bayes algorithms are mostly used in face recognition, weather prediction, Medical Diagnosis, News classification, Sentiment Analysis, etc. In this article, we learned the mathematical intuition behind Naive bayes algorithm in Machine learning. You have already taken your first step to master this algorithm and from here all you need is practice this.

What are the two types of Naive Bayes?

Frequently Asked Questions

Q1. Why is Naive Bayes algorithm used?

A. The Naive Bayes algorithm is used due to its simplicity, efficiency, and effectiveness in certain types of classification tasks. It’s particularly suitable for text classification, spam filtering, and sentiment analysis. It assumes independence between features, making it computationally efficient with minimal data. Despite its “naive” assumption, it often performs well in practice, making it a popular choice for various applications.

Q2. What is the Naive Bayes algorithm?

A. The Naive Bayes algorithm is a probabilistic classification technique based on Bayes’ theorem. It assumes that all features in the data are independent of each other, given the class label. It calculates the probability of a particular class for a given set of features and selects the class with the highest probability as the predicted class. It’s commonly used in text classification and spam filtering tasks.

Q3. What is Naive Bayes Classifier?

Naive Bayes is a simple yet powerful machine learning algorithm for classification. It uses Bayes theorem to predict the class of something (like spam or not spam) based on its features (like words in an email). It’s popular for its speed and accuracy, especially in text classification tasks.
pen_spark

Q4. What are the two types of Naive Bayes?

Multinomial Naive Bayes: Used for text classification where features are word counts or frequencies.
Gaussian Naive Bayes: Used when features are continuous and follow a normal distribution.

Q5. What is a naive algorithm?

naive algorithm is a simple and straightforward method that solves problems using basic logic without considering optimization or efficiency. It often serves as a baseline for comparison with more advanced algorithms.

Anshul

I have recently graduated with a Bachelor's degree in Statistics and am passionate about pursuing a career in the field of data science, machine learning, and artificial intelligence. Throughout my academic journey, I thoroughly enjoyed exploring data to uncover valuable insights and trends.

I am eager to continue learning and expanding my knowledge in the field of data science. I am particularly interested in exploring deep learning and natural language processing, and I am constantly seeking out new challenges to improve my skills. My ultimate goal is to use my expertise to help businesses and organizations make data-driven decisions and drive growth and success.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Jayesh

I am Phd Student. My phd theses title is "prediction of student performance in higher education using machine learning" Can you please help how to build model using different ML algorithm to predict the performance of student

Jayasurya

if suppose my test is animal cat size medium color grey but here no probability value of cat and grey now how can I get correct classification?

Ignus

What is the p(Test)?

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Naive Bayes Algorithm: A Complete guide for Data Science Enthusiasts

Table of contents

What is Naive Bayes Algorithm?

Naive Bayes Algorithm in Machine Learning

Conditional Probability for Naive Bayes

Bayes’ Rule

What is Naive Bayes?

Naive Bayes Example

Assumptions of Naive Bayes

Gaussian Naive Bayes

Endnotes

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid