7 Proven Steps to Impress the Recruiter with Your Machine Learning Projects

Manpreet Last Updated : 27 Mar, 2021

9 min read

This article was published as a part of the Data Science Blogathon.

Introduction

After arming yourself up with all the relevant industry skills, after putting in hours of your time, energy, and soul into your projects, comes the most
daunting task – “APPLYING FOR A JOB”. Fortunately, your profile appears at the top of the list and you get shortlisted for the interview. WHAT NEXT?

The first definite thing that comes to your mind is revising all the concepts and going through all that Machine Learning jargon, but there’s another ‘must-do that should be on your preparation list which has the maximum potential to bring the trophy home (A.K.A to help you bag that job) – ‘REVISING YOUR PROJECTS’.

ML Projects image — SOURCE: Author’s humorous efforts

There is no escaping when it comes to talking about one of your projects. You sure know all the Machine learning hacks and concepts but they are hard of any use if neither of them ever sees the face of a Machine Learning model (i.e. executing on those skills and coming up with solutions that matter). One sure-shot question you’ll always face in a Data Science interview will be regarding your projects. Recruiters specifically ask this question to know:

1. how effectively you can put your Machine Learning skills to use,
2. how much efforts you’ve put into building a project,
3. your thought process and capability to navigate to a solution,
4. whether you really understand what Machine Learning truly is, and
5. your
  ability to communicate your methodology and solution to the clients.

These 5 points will be strategically focused on throughout this article.

This question may seem a bit daunting to answer at first but with a clear understanding of the project and a concise way of talking about it, this question can turn out to be the only question that lets you steer the interview in your favor and impress the recruiters with what you already know. This question can be thrown your way in any of the following ways:

1. Can you walk us through one of your projects?
2. Can you give us a brief of an exciting project you’ve worked on?
3. What’s
  the recent project you’ve worked on?

There is no one-size-fits-all answer to this question but some structure can be brought to it and worked upon based on the nature of the project, its
complexity, and your view of the problem statement.

In this article are listed 7 basic steps (points) which you can keep in mind while structuring your answer and briefing about your project. These 7 steps have been ordered in a linear way and the answers in accordance with each of these steps can definitely help you form a concrete version of your response.

It is not necessary to follow it in a linear fashion and your responses can be modified according to your needs.

1. Select the right project

This is the most crucial step in the process. It is not a great practice to talk about projects which are irrelevant to the company you’ve applied to.

Suppose the company builds robots that interact with the user and act accordingly. Intuitively, you know that such a company deals with a lot of Natural Language Processing (NLP). In such a case, it is almost irrelevant to talk about a project that predicts house prices based on some numerical features.

Having a relevant project that has a use case that would be of interest to the company and aid its operations will be the wisest choice you’ll make. Adding irrelevant projects will only indicate that you cannot prioritize well. The selected projects could be from your current organization, internships, or datasets chosen from online platforms (Eg. Kaggle, UCI ML repository, etc).

2. Brief about the project (and relevant stakeholders!)

There has to be absolute clarity when it comes to a briefing about your project as it is the first step that will grab the interviewer’s attention. Explain in easy words what it is that you were trying to achieve with this specific project. A minimum of 5-7 lines should suffice in explaining the problem statement at hand.

Specifying the stakeholders of the project will indicate to the recruiter that you have enough knowledge of the project and know exactly how it can be implemented in a real scenario. It also showcases your business acumen associated with the problem, how well do you know the problem, and how you can articulate it to the mentioned stakeholders of the project in a way that will give them key insights.

It could be a regression problem predicting the value of air tickets at a specific time of the day or a classification problem predicting whether a customer will buy the insurance provided by the company or not. Having a thorough knowledge of the problem statement will help you think of the key stakeholders involved.

“A stakeholder is someone who will be directly affected by the findings or the predictions at the end of the project cycle.”

In the case of the air-ticket prediction problem,

it could be a traveler who is comparing prices of flights at different times on a ticket-booking website, the team members who want to ensure that they display the lowest prices on the website to keep attracting customers,
the marketing team wants to use the cheapest price predictions as their USP (Unique Selling Point) and market the service accordingly.

There can be various other stakeholders involved apart from the ones mentioned above.

3. What does one single row feature?

One single row represents exactly what the problem is trying to solve. A single row comprises all the features used and the dependent target variable that the Machine Learning model will predict. One way to talk about it is to start with the dependent (target) variable and explain what the final prediction would look like. The features can then be talked about by dividing them into categorical features and numerical features.

ML projects Flight Price Prediction Row — SOURCE: Flight Price Prediction Row

You might ask ‘Why this is even needed?’ It’s because experience has proven that talking about every single column (during EDA) doesn’t give the exact gist of the features involved and there are chances you might end up looping in your own explanation trying to figure out which column to talk about next. Whereas, focusing on just one row makes the explanation much concise and easy for the recruiter to comprehend.

4. Where did your data come from?

It is normal to think that this should be the first point while starting to talk about your project but it can be perceived the other way around too. After
talking about your project objective, its stakeholders, and the features you intrigue the interviewer to know more about the project. After you’ve generated enough curiosity for the approach you’ve taken to solve the problem you can always mention the source of your dataset.

This dataset could come from the following sources:

1. Any coursework you were are a part of
2. Dataset extracted from online repositories (Kaggle, UCI ML Repository, data.gov.in, etc)
3. Ethically mined data using third-party APIs

You should always reveal the source of your dataset as it marks the authenticity of the project you’re talking about.

5. Exploratory Data Analysis(in B-R-I-E-F)

This section is a “TRAP!”

Your dataset has so many features that you can have that massive urge of talking about each one of those (and that too in detail!). This is the section where you have to keep in mind that there is a difference between a ‘Data Analyst’ and a ‘Data Scientist’.

According to an article,

“Some of the main differences revolve around automation of the analysis — data scientists focus on automating analysis and predictions with algorithms using programming languages like Python, whereas data analysts use stationary, or past data, and in some cases, will create predicted scenarios with tools like Tableau and SQL.”

This clearly helps you estimate the amount of time you should put in talking about the exploration you have done. It is sufficient to talk about the features and their impacts on the target variable but talking about a single feature in detail will hardly be of any help. You’ve already explained what each variable means while talking about your row above and most of the features are intuitive enough.

For example, if you have a regression problem predicting the price of a particular house it is quite intuitive to know that the bigger the area of the house, the more the value of it. So dwelling on one column and talking about it in-depth will only steal your time and indicate to the interviewer that you cannot prioritize well. Rather you can quickly skim through the EDA by showing them the graphs (if you’re allowed to use PowerPoint Presentations while talking about your project then you definitely have a very good opportunity to structure it well and present it in the most concise and smart way).

Remember, it’s your approach to solve the problem at hand and what insights you can provide to the stakeholders involved that are of interest to the interviewer.

EDA is a “part” of the process and not the whole deal.

There can be use cases which require an extensive amount of explaining the analysis. The length of your discussion should be modified accordingly if you feel there is a need to stress on a particular feature to be able to explain the model building and the approach taken.

6. Model Building

This is your “ARENA” which has the maximum capability of proving your skills as a true Data Scientist.

This section can be divided into 4 subparts:

The approach

Training Process

Model Tuning

Performance
Metrics

An often overlooked phase of a project is building a baseline model. It is quite usual in the initial phases of your learning to skip this step as it is hardly ever talked about. In simple words, a baseline model is a simplistic version of a Machine Learning model that you can easily build on the dataset by doing very little preprocessing.

For example, you have a regression problem then the first Machine Learning model that quickly comes to your mind is Linear Regression. So, you use the basic dataset, do a bit of preprocessing on the data that is sufficient enough and
necessary for a model to make predictions and run your model. The score received on this model will then become a comparing point for other models you build after tuning and final processing. It creates an impression when you include baseline model while sharing your approach.

This is also where you extensively talk about your oversampling/undersampling techniques if the dataset was highly imbalanced. You can also specify the various ways in which you tackled data leakages, overfitting, bias-variance tradeoffs, and improved your accuracy while using the learning curves. There are various other aspects that you can highlight in this section and showcase the skills you have mastered and applied, some of those can be:

Feature Scaling – Standardization and Normalization

The encoded variables – One Hot Encoding, Label Encoding

The Feature Reduction techniques used

Feature engineering that was performed

One question to definitely come your way is regarding the model you finalized: ‘Which model did you choose and why?’ Relying on just one Machine Learning model for your predictions is not a good practice and therefore you test other models to finalize on the one that gives you better accuracy on the unseen data. Here you talk about the comparison between different models you experimented on and the final model you chose to make predictions.

After selecting a model, you choose a set of hyperparameters based on trial and error or using approaches like GridSearchCV or RandomizedSearchCV. Explaining the model tuning process gives you an edge and indicates to the interviewer that you are aware of the basic Machine Learning concepts.

Finally, you talk about the metric you chose to evaluate your model. Selecting an evaluation metric suitable to the use case is of utmost importance as it indicates your ability to completely understand the problem at hand and evaluate it in a way that affects the business involved directly without having to bargain on its most important aspects. It is a great indicator of your ability to analyze effectively and logically.

7. Model Deployment

It is one thing to build a Machine Learning model based on a training dataset in a Jupyter notebook and a totally different thing to be able to use that model to predict values on the data it has never seen before. Learning a way or two to deploy your model makes sure that you know how to take your project in the production phase and make it easier for a layman to use it without having to see the technicalities that go behind it. You could deploy it using a web app or an API. It is always highly beneficial if you have your project model deployed on any of the platforms and have it ready to show it to the recruiter to gain those extra brownie points.

If not to impress the recruiter, you’d still want to deploy it to show the world where you’ve been putting in all that BLOOD, SWEAT, AND TEARS!

Model Deployment — SOURCE: Author’s humorous efforts

Last But Not The Least:

There will definitely be some questions that will pop up in the interviewer’s mind during your explanation of the project and you should leave no stone unturned when it comes to revising your project. You should keep some questions about your project ready to be answered if and when they are asked to you.

Important Note:

There is no doubt you worked day in and day out to understand the nuances of the project and completed it with 100% of your potential. During your interview, it is not how many hours you put in but how concisely you can convey all the Technical as well as the Business aspects of it in the short period of time that you have.

AUTHOR’S NOTE:

Having been in this field for 3 years now, I can confidently say that my love for data and its magic only increases with each passing day. This article is the result of interviews that showed me the right way to talk about my own projects. It is curated based on all the interview questions I had to counter and refine on those experiences every time. There is never a one-size-fits-all answer but this guide can be one of the succinct ways you can organize your answers.

In case you have any feedback or wish to discuss further on this topic, please comment below or drop a text on my LinkedIn and I’d be more than happy to connect.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Manpreet

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

7 Proven Steps to Impress the Recruiter with Your Machine Learning Projects

Introduction

1. Select the right project

2. Brief about the project (and relevant stakeholders!)

3. What does one single row feature?

4. Where did your data come from?

5. Exploratory Data Analysis(in B-R-I-E-F)

6. Model Building

7. Model Deployment

Last But Not The Least:

Important Note:

AUTHOR’S NOTE:

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid