This article was published as a part of the Data Science Blogathon.
We, as data science and machine learning enthusiasts, have learned about various algorithms like Logistic Regression, Linear Regression, Decision Trees, Naive Bayes, etc. But at the same time, are we preparing for the interviews? As we know, the end goal is to land our dream job for the companies we are aiming for. Henceforth, knowing how the questions are turned and twisted by the interviewer is very much important to answer in the most efficient reason; I’m starting with the series of the Top 10 most frequently asked interview questions on various machine learning algorithms.
In this article, we will be covering the top 10 interview questions on the Naive Bayes classifier, but we are not gonna jump straight over those tricky questions; instead, let’s first have some high-level understanding of this algorithm so that one will be able to understand the concept behind it.
Naive Bayes is considered to be the top choice while dealing with classification problems, and it has it’s rooted in the concept of probabilities. Specifically, this algorithm is the by-product of the Bayes Theorem. But you must be thinking that if it is based on Bayes theorem, why is this Naive term in the prefix position as “Naive” means “Dumb”? So is this algorithm dumb or useful?
The answer is simple and pretty straightforward; this algorithm is not at all Naive but, at times, quite useful and simple when compared to other complex algorithms. The reason it is known to be the naive Bayes is because of its general assumptions, which takes us to our very first interview question:
If one wants to give the short answer, then they can simply say – “Features are independent.” But this will not be sufficient; hence we need to explain the answer briefly: In Naive Bayes, it assumes beforehand that all the features are independent of each other, and it treats all of them separately, which gives each feature an equal contribution to the final result. This assumption is known as the I.I.D assumption.
Naive Bayes is one of the algorithms that can handle the missing data at its end. Only the reason is that in this algo, all the attributes are handled separately during both model construction and prediction time If data points are missing for a certain feature, then it can be ignored when a probability is calculated for a separate class, which makes it handle the missing data at model building phase itself.Do refer to this amazing tutorial for a better understanding
Naive Bayes is a probabilistic-based machine learning algorithm, and it can be used widely in many classification tasks:
The straightforward answer is: Naive Bayes is a generative type of classifier. But this information is not enough. We should also know what a generative type of classifier is.Generative: This type of classifier learns from the model that generates the data behind the scene by estimating the distribution of the model. Then it predicts the unseen data. Henceforth, the same goes for the NB classifier, as it learns from the distribution of data and doesn’t create a decision boundary to classify components.
Prior probability: This can also be tagged as an initial probability. It’s the part of Bayesian statistics where it is the probability when the data is not even collected. That’s why it is known as “Prior” probability. This probability is the outcome vs. the current predictor before the experiment is performed.Posterior probability: In simple words, this is the probability that we get after a few experiment trials. It is the ascendant of prior probability. For that reason, it is also known as updated probability.
We have two separate and dedicated distributions for both categorical and numerical values to deal with either type of value. They are mentioned below:
So we are in the last section of this article and have reached here after completing the top 10 interview questions on the NB classifier. This segment usually briefly discusses everything so we can list our learnings in a nutshell.
I hope you liked my article on the Top 10 most frequently asked interview questions on the Naive Bayes classifier. If you have any opinions or questions, then comment below.
Connect with me on LinkedIn for further discussion.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.