This article was published as a part of the Data Science Blogathon.
Companies are trying to disrupt the technological and business market by introducing new and smart products and techniques in society by adopting new age-technologies like Artificial intelligence and Machine learning. Each organization is searching for well-talented and experienced people who can serve them on their demands. Today data scientists, data analysts, machine learning engineers, and computer vision engineers are more in-demand organizational roles. If you wish to apply and grab a job in the tech domain, it’s crucial to know common machine learning interview questions that recruiters ask.
The article covers some popular Machine learning interview questions that will force you to think one step ahead of your knowledge, and you will like to encounter and achieve your dream job.
Explore these machine learning interview questions to sharpen your job-interview skills and ace future job interviews in the field.
Answer: The machine learning algorithm is a function. Whenever you assume the function of the data, then it is a parametric machine learning algorithm. Linear regression is a good example of a parametric machine learning algorithm because while using Linear regression, you assume that the data you are using is linear, so the function will be a straight line.
Hence you can define a parametric ML algorithm whenever you assume the nature. That kind of algorithm is a parametric ML algorithm. To contradict it, when you do not make any assumption about the nature of the algorithm, it is a non-parametric ML algorithm.Explore these concepts through machine learning interview questions to deepen your understanding.
Answer: In machine learning, a loss function measures how far the estimated value is from the true value (original – predicted). So, the loss function is a value that explains the machine learning model’s performance. A loss function is nothing but a mathematical function, and it must have a graph.
In the convex function, the graph should not lie above the drawn line on a graph, or the graph should have a single slope where no local minima are present in the non-convex function; the graph or curve passes above the line, then it is a non-convex function.
When we have a non-convex loss function, then there is a problem because it can have more than one minimum, so your optimization algorithm, for example, gradient descent, is not able to find the global minima because most of the time, it gets stuck in local minima and does not converge. Hence, you cannot find the best-optimized solution for defined parameters.
The above diagram shows that the first figure is a convex function where only one global minimum is there and no local minima. The second figure concerns the complex neural network that reflects many local minima and one global minimum.
Answer: This is one important question that the interviewer asks you to understand the working of Machine learning and deep learning. So to answer this project, you should keep two points in front of the interviewer the factors for which you will perfect deep learning and the factors that go against deep learning while choosing it for a project.
Favor points to choosing Deep learning
Against points to not choosing deep learning
Whenever we build any machine learning model, then, during testing, there can only be 4 cases represented in the below diagram.
It is known as a false negative when the actual value is positive and the predicted value is negative. When the actual value is negative and the predicted value is positive, it is known as a False positive.
Example – Suppose you build an email classification, and there can be 2 types of mistakes that the model can make. When mail is span, and it says not-span, which is a False negative. Another is when mail is not-span, and it says span, which is a False positive. In this case, a False positive is more crucial because when the mail is not spanned, that can contain some confidential data, and it says as span, then it is dangerous.
While assuming the scenario where we build a covid detection system where the person is infected and the system results as not infected, which is a False-negative, it is more crucial to leave an infected person freely, which can harm others.
Naive Bayes is a popular machine learning algorithm that works on the Bayes theorem. Naive means innocent (simple). It is a supervised machine learning algorithm where you have many independent columns and one output column. Also, naive input columns are independent, and in normal, the data have some relationship, but the naive Bayes does not assume this. For its work, there must be no relationship between input columns.
You should know about mean and median if you have read basic descriptive statistics. The mean is the average of all the observations (total sum divided by a total number of observations). The Median is the center number obtained after sorting all the observations. Both measures show the central tendency of the data. So when we have outliers in data, using the mean in this condition is not recommended.
For example, we have a dataset of several students with annual packages. All the students got the package between 3 to 6 LPA, but 2-3 students have packages as 25LPA, and 38LPA, and when we are asked to give an average class package, then the mean will be a huge number which is wrong in this case. So better to use Median in such types of cases.
This is a rarely asked question but very important to understand. According to research and practical performance on different machine learning algorithms, It is stated that if the number of data increases, then the weal ML algorithms also perform better. It simply means that if you do not focus on the algorithmic part and invest unlimited money and time in corpus building, then any algorithm will generate good results, which is called the unreasonable effectiveness of data.
In practically also we observe any machine learning problem statement then different algorithm gives different performance below graph also reflects that but the power of data changes all the scenario.
Lazy learning algorithms are the learning algorithms that do not learn in the training phase and perform the action (learning) in the prediction phase when they receive queries. Indeed the eager learning algorithms learn during the training phase or generate a function of input and output during training.
KNN comes under a lazy learning algorithm because It stores the data first, and when any new query arises, it finds the distance of the new data point to all other data points and the 3 nearest data points. Among the 3 data points, it does a majority count (voting), and the class with a majority count is the resultant prediction. In this entire process, the KNN starts working when it receives the query, and before this, it does nothing.
If you have learned about types of machine learning, then we always hear 3 types supervised, unsupervised, and reinforcement, but there is one more type known as semi-supervised machine learning. Semi-supervised simply means it is partially supervised and partially unsupervised. Achieving the output columns (labels) is costly and time-consuming because it requires human effort. At the initial level to prepare the dataset, any human needs to sit to provide the labels. So some researchers think that the limited amount of data and the remaining amount should be automatically labeled. This is the core idea behind semi-supervised learning.
For Example ,Google Photos utilizes semi-supervised machine learning, a technique employed in labeling photos. This approach ensures efficiency as it identifies one photo and applies the same label to similar ones. Explore these techniques through machine learning interview questions to deepen your understanding and excel in interviews.
OOB stands for out-of-bag evaluation. Whenever we use the bagging algorithm for training purposes, then it selects the samples using sampling with the replacement method. This method selects multiple rows in a sample, and some are left unselected in any sample, known as out-of-bag rows. We can use these rows as test data without creating external test data, known as out-of-bag evaluation.
The question can be asked to check your practical knowledge of machine learning. A decision tree is a simple algorithm that works on an ID3 or CART basis. And a collection of multiple decision trees is a random forest. If we talk more practically than in most datasets, the random forest performs better than the decision tree. But there are some points where a decision tree is more useful than a random forest.
The logistic regression works closely with the linear regression model. The only difference is you use the sigmoid function in output and calculate probability, and using a threshold gives the result as 0 or 1. The regression is so called because it calculates a constant value probability. When we calculate the continuous value, it is called the regression algorithm, so the logistic algorithm is called the regression algorithm.
Most companies promote their products with the common tagline that says the more you use our product, the more intelligent it gets. So their intention is toward online machine learning. First, let us understand batch machine learning (offline ML). In batch machine learning, you have a data set, train an ML model on the entire data and deploy the model on the server. After that, if you want to make certain changes, you will bring down the model, make the changes, and deploy it again. This is offline or batch machine learning.
Online machine learning is the type of learning where model training happens on the server. It is also known as incremental learning. It means that as it gets new data, it performs two tasks: first, it predicts the outcome, and second, it gets trained on new data. Using this, the model performance slowly gets improves with time. The best example of online machine learning is a recommendation system. Youtube is a great example of how the feed changes if you watch any video and return to the home page
It is one interesting question, and this is one theorem in Machine learning. In 1996, well-known computer scientist David Polpert published a paper containing the No Free Lunch Theorem. According to this theorem, if you do not make any assumptions about ML models, then you cannot tell which type of data which ML model should I pick.
This is an important question when you prepare for Machine learning interviews because it checks your practical knowledge about handling a massive amount of data. So there are 3 methods that you can use in this kind of scenario.
The data you receive in machine learning is of two types structured and unstructured.
Structured – Data in the tabular form is known as structured data. If we say tabular, the data collects many rows and columns. Data in excel sheet format are structured data. In structured data, you will always find text inside the columns. Searching in structured data is simple. Traditional ML algorithms are easily applicable to structured data. Structured data is mainly used in the Analytics domain.
Unstructured Data – Unorganized data contains different types of files like images, audio, video, GIFs, text files, etc. Search becomes difficult in unstructured data. Here mainly deep learning techniques are used. Unstructured data is used in NLP, text-mining, and computer vision.
Mostly High, performing models in machine learning are obtained from Bagging or boosting. In most of the interviews, the question is raised to the candidate to list the main point of difference between both the techniques. Ensemble learning came into the market because we want a model with low bias and variance. Still, each ML single model produces a combination of low bias and high variance. We will discuss 3 main points of difference to answer the question correctly and accurately.
This is a very important and basic machine learning question. The interviewer will always start asking you questions from linear regression and make an approximate judgment of your practical knowledge. Below are the 5 main assumptions of linear regression.
Correlation describes the relationship between two strongly positive or negative correlated variables. It is used to figure out the quantitative relationship between two variables. Examples like income and expenditure, demand and supply, etc.
Covariance is a simple way to calculate the correlation between two variables. The problem with covariance is that they are hard to compare without normalization.
Bias and variance are both a type of errors that ML algorithms reflect. Bias occurs due to the simplistic assumption of the machine learning algorithm. When the model does not perform well on training data, then the model is reflected as high bias or the condition of underfitting occurs.
Variance is an error that occurs due to the complexity of the algorithm. When the algorithm cannot predict approximate results on new data or tries to overfit the model, we have a high variance combination.
We need to trade between bias and variance to reduce the error optimally.
Machine learning is a vast field, and everything is connected. In this article, we have covered some practical-based questions that check your practical and research knowledge about algorithms. When diving into machine learning interview questions, it’s crucial to understand how each algorithm works. While using each algorithm, you observe its behavior and analyze the outcomes. These questions aim to assess your hands-on experience and problem-solving skills in the realm of Machine learning interview questions. Let us conclude the article with key takeaways that will help you prepare better for machine learning jobs and ML jobs.
Thank You Note
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.