After arming yourself up with all the relevant industry skills, after putting in hours of your time, energy, and soul into your projects, comes the most
daunting task – “APPLYING FOR A JOB”. Fortunately, your profile appears at the top of the list and you get shortlisted for the interview. WHAT NEXT?
The first definite thing that comes to your mind is revising all the concepts and going through all that Machine Learning jargon, but there’s another ‘must-do that should be on your preparation list which has the maximum potential to bring the trophy home (A.K.A to help you bag that job) – ‘REVISING YOUR PROJECTS’.
There is no escaping when it comes to talking about one of your projects. You sure know all the Machine learning hacks and concepts but they are hard of any use if neither of them ever sees the face of a Machine Learning model (i.e. executing on those skills and coming up with solutions that matter). One sure-shot question you’ll always face in a Data Science interview will be regarding your projects. Recruiters specifically ask this question to know:
These 5 points will be strategically focused on throughout this article.
This question may seem a bit daunting to answer at first but with a clear understanding of the project and a concise way of talking about it, this question can turn out to be the only question that lets you steer the interview in your favor and impress the recruiters with what you already know. This question can be thrown your way in any of the following ways:
There is no one-size-fits-all answer to this question but some structure can be brought to it and worked upon based on the nature of the project, its
complexity, and your view of the problem statement.
In this article are listed 7 basic steps (points) which you can keep in mind while structuring your answer and briefing about your project. These 7 steps have been ordered in a linear way and the answers in accordance with each of these steps can definitely help you form a concrete version of your response.
It is not necessary to follow it in a linear fashion and your responses can be modified according to your needs.
This is the most crucial step in the process. It is not a great practice to talk about projects which are irrelevant to the company you’ve applied to.
Suppose the company builds robots that interact with the user and act accordingly. Intuitively, you know that such a company deals with a lot of Natural Language Processing (NLP). In such a case, it is almost irrelevant to talk about a project that predicts house prices based on some numerical features.
Having a relevant project that has a use case that would be of interest to the company and aid its operations will be the wisest choice you’ll make. Adding irrelevant projects will only indicate that you cannot prioritize well. The selected projects could be from your current organization, internships, or datasets chosen from online platforms (Eg. Kaggle, UCI ML repository, etc).
There has to be absolute clarity when it comes to a briefing about your project as it is the first step that will grab the interviewer’s attention. Explain in easy words what it is that you were trying to achieve with this specific project. A minimum of 5-7 lines should suffice in explaining the problem statement at hand.
Specifying the stakeholders of the project will indicate to the recruiter that you have enough knowledge of the project and know exactly how it can be implemented in a real scenario. It also showcases your business acumen associated with the problem, how well do you know the problem, and how you can articulate it to the mentioned stakeholders of the project in a way that will give them key insights.
It could be a regression problem predicting the value of air tickets at a specific time of the day or a classification problem predicting whether a customer will buy the insurance provided by the company or not. Having a thorough knowledge of the problem statement will help you think of the key stakeholders involved.
“A stakeholder is someone who will be directly affected by the findings or the predictions at the end of the project cycle.”
In the case of the air-ticket prediction problem,
There can be various other stakeholders involved apart from the ones mentioned above.
One single row represents exactly what the problem is trying to solve. A single row comprises all the features used and the dependent target variable that the Machine Learning model will predict. One way to talk about it is to start with the dependent (target) variable and explain what the final prediction would look like. The features can then be talked about by dividing them into categorical features and numerical features.
You might ask ‘Why this is even needed?’ It’s because experience has proven that talking about every single column (during EDA) doesn’t give the exact gist of the features involved and there are chances you might end up looping in your own explanation trying to figure out which column to talk about next. Whereas, focusing on just one row makes the explanation much concise and easy for the recruiter to comprehend.
It is normal to think that this should be the first point while starting to talk about your project but it can be perceived the other way around too. After
talking about your project objective, its stakeholders, and the features you intrigue the interviewer to know more about the project. After you’ve generated enough curiosity for the approach you’ve taken to solve the problem you can always mention the source of your dataset.
This dataset could come from the following sources:
You should always reveal the source of your dataset as it marks the authenticity of the project you’re talking about.
This section is a “TRAP!”
Your dataset has so many features that you can have that massive urge of talking about each one of those (and that too in detail!). This is the section where you have to keep in mind that there is a difference between a ‘Data Analyst’ and a ‘Data Scientist’.
According to an article,
“Some of the main differences revolve around automation of the analysis — data scientists focus on automating analysis and predictions with algorithms using programming languages like Python, whereas data analysts use stationary, or past data, and in some cases, will create predicted scenarios with tools like Tableau and SQL.”
This clearly helps you estimate the amount of time you should put in talking about the exploration you have done. It is sufficient to talk about the features and their impacts on the target variable but talking about a single feature in detail will hardly be of any help. You’ve already explained what each variable means while talking about your row above and most of the features are intuitive enough.
For example, if you have a regression problem predicting the price of a particular house it is quite intuitive to know that the bigger the area of the house, the more the value of it. So dwelling on one column and talking about it in-depth will only steal your time and indicate to the interviewer that you cannot prioritize well. Rather you can quickly skim through the EDA by showing them the graphs (if you’re allowed to use PowerPoint Presentations while talking about your project then you definitely have a very good opportunity to structure it well and present it in the most concise and smart way).
Remember, it’s your approach to solve the problem at hand and what insights you can provide to the stakeholders involved that are of interest to the interviewer.
EDA is a “part” of the process and not the whole deal.
There can be use cases which require an extensive amount of explaining the analysis. The length of your discussion should be modified accordingly if you feel there is a need to stress on a particular feature to be able to explain the model building and the approach taken.
This is your “ARENA” which has the maximum capability of proving your skills as a true Data Scientist.
This section can be divided into 4 subparts:
- The approach
- Training Process
- Model Tuning
- Performance
Metrics
An often overlooked phase of a project is building a baseline model. It is quite usual in the initial phases of your learning to skip this step as it is hardly ever talked about. In simple words, a baseline model is a simplistic version of a Machine Learning model that you can easily build on the dataset by doing very little preprocessing.
For example, you have a regression problem then the first Machine Learning model that quickly comes to your mind is Linear Regression. So, you use the basic dataset, do a bit of preprocessing on the data that is sufficient enough and
necessary for a model to make predictions and run your model. The score received on this model will then become a comparing point for other models you build after tuning and final processing. It creates an impression when you include baseline model while sharing your approach.
This is also where you extensively talk about your oversampling/undersampling techniques if the dataset was highly imbalanced. You can also specify the various ways in which you tackled data leakages, overfitting, bias-variance tradeoffs, and improved your accuracy while using the learning curves. There are various other aspects that you can highlight in this section and showcase the skills you have mastered and applied, some of those can be:
- Feature Scaling – Standardization and Normalization
- The encoded variables – One Hot Encoding, Label Encoding
- The Feature Reduction techniques used
- Feature engineering that was performed
One question to definitely come your way is regarding the model you finalized: ‘Which model did you choose and why?’ Relying on just one Machine Learning model for your predictions is not a good practice and therefore you test other models to finalize on the one that gives you better accuracy on the unseen data. Here you talk about the comparison between different models you experimented on and the final model you chose to make predictions.
After selecting a model, you choose a set of hyperparameters based on trial and error or using approaches like GridSearchCV or RandomizedSearchCV. Explaining the model tuning process gives you an edge and indicates to the interviewer that you are aware of the basic Machine Learning concepts.
Finally, you talk about the metric you chose to evaluate your model. Selecting an evaluation metric suitable to the use case is of utmost importance as it indicates your ability to completely understand the problem at hand and evaluate it in a way that affects the business involved directly without having to bargain on its most important aspects. It is a great indicator of your ability to analyze effectively and logically.
It is one thing to build a Machine Learning model based on a training dataset in a Jupyter notebook and a totally different thing to be able to use that model to predict values on the data it has never seen before. Learning a way or two to deploy your model makes sure that you know how to take your project in the production phase and make it easier for a layman to use it without having to see the technicalities that go behind it. You could deploy it using a web app or an API. It is always highly beneficial if you have your project model deployed on any of the platforms and have it ready to show it to the recruiter to gain those extra brownie points.
If not to impress the recruiter, you’d still want to deploy it to show the world where you’ve been putting in all that BLOOD, SWEAT, AND TEARS!
There will definitely be some questions that will pop up in the interviewer’s mind during your explanation of the project and you should leave no stone unturned when it comes to revising your project. You should keep some questions about your project ready to be answered if and when they are asked to you.
There is no doubt you worked day in and day out to understand the nuances of the project and completed it with 100% of your potential. During your interview, it is not how many hours you put in but how concisely you can convey all the Technical as well as the Business aspects of it in the short period of time that you have.
Having been in this field for 3 years now, I can confidently say that my love for data and its magic only increases with each passing day. This article is the result of interviews that showed me the right way to talk about my own projects. It is curated based on all the interview questions I had to counter and refine on those experiences every time. There is never a one-size-fits-all answer but this guide can be one of the succinct ways you can organize your answers.
In case you have any feedback or wish to discuss further on this topic, please comment below or drop a text on my LinkedIn and I’d be more than happy to connect.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Bang on ..the topic which is mostly neglected , explained beautifully 👌👌.
Nice article
I am glad you liked it Anurag! Hope it helped :D
Very nice article!! Thanks for putting