One of the most common questions asked these days is what makes a good data scientist. The simple answer – it depends. The long answer – someone who can lead all the phases of a data science project. For an even longer answer, read on.
A Data Science project is not just a hackathon competition where a ready-made dataset is provided and the success metric or the error to optimize is clearly laid out.
So what’s different? Well, there are various phases in a data science project – Getting the context of the problem, understanding the data, deep diving into it, understanding implementations and coding shortcomings, figuring out the right set of algorithms to use, coding those algos, performance of those algorithms from an engineering and a data science perspective and optimization.
As you can imagine, a data science skillset is a mixture of what was traditionally called computer science, and business analytics. Sometimes, given the breadth and depth of the work, you might be unlikely to find a person who knows all these aspects (let alone being good at them). Instead, its better to build a team that has a mix of people who specialize in different areas required for the data science project.
In this article, we will look at what types of data scientists are there, how to find them, what the current process is and what can be further improved.
Given this prelude, I am going to help us understand and categorize the existing talent pool in the market into different categories of skill sets and knowledge based on three dimensions – Context, Coding and Concept.
The context in which the problem is set.
Simply put, R/Python or any other open source data science tool with which the person can analyze, create features and build a model. Work with the implementation team to get the codes to a production environment.
The depth of understanding the technical solution. Ability to understand the algorithm in detail. Some knowledge of literature in this area. Ability to do a lit survey and differentiate or adopt the solution to the given problem.
The size of the bubble in the above chart purely measures knowledge and depth of understanding the algorithms.
Given that we now understand how data science talents are, how do you, as a start up or a mid-level company or an enterprise, match the right pool to the available job? Whom do you choose and what weights do you give to each of the 3Cs at what stage of your company? Let’s examine this in a bit more detail.
If you are starting up or building a data science pool in your organization, chances are that the problem is not well defined and is still very blurry. The need of the hour could be breadth rather than depth. Maybe the balance could be more of geeky business analysts, data scientists and data engineers than the algo Specialists. Depending on the nature of the problem it could be a mix 30 – 40% Geeky Business Analysts and the rest divided between data scientists and engineers
Here I would assume that the problem is well defined. There may be existing data science solutions based either on machine learning or some other technique. The need of the hour may be to upgrade the solutions and get more of the solutions into deployment mode. I would recommend this – 40% of data scientists, 20% of data engineers, 20% Algorithm Specialists and another 20% of Geeky Business analysts.
For organizations that wants to have a research division, the mix could shift towards algorithm specialists. They can afford to have fewer Data Scientists and Business Analysts. The idea here is that the organization aims to contribute more to research journals and wants to mark its space in certain areas or specializations.
But sometimes during this search for talent, we also come across what I like to call “Super Scientists”. Finding a super data scientist is 10 times tougher than a full stack developer. This is why there is no industry tag to them. There is also a fundamental mistake of evaluating data scientists only in terms of knowledge of ML or Python (or any other tool). This yardstick only effectively measures the efficiency of modeling process to model delivery and leaves the other parts to mere chance. Salary is also not a yardstick while finding these super scientists as very few companies realize their potential and hence would have given a premium to them.
Before we see how to find them, let’s take a look at what a super scientist is capable of doing.
As you can see, all 10 steps are important.
Currently, most hiring organizations evaluate data scientists only on point 4-5 in the form of an interview discussion. There too the focus ends up being too much on the knowledge and too little on the application itself. How do get your code into production? Can you streamline your pipeline to work with the existing hardware (and even software) that exists in the organization? These are critical questions I feel are not asked enough in interviews.
More or less the rest of the 8 steps are left to chance. Its important we start innovating on how we test the usefulness of a person to a job than how much the person knows
Case Studies can be a key instrument in testing all 9 points. Case studies can be presented as real data science problems that would show up the job. For example, instead of interviewing on collaborative filtering, one can give a statement that we want to show or send right items to the right set of users. Then we can evaluate how the candidate arrives at a solution and how does the person think of success metrics, KPIs, etc. Create a scenario where the interviewer plays the role of a business or problem owner and see how the candidate reacts to constraints – be it data or implementations. Then deep dive into programming and algorithms.
This is my humble attempt on building a data science team and how to recruit evasive super scientists . Now, time to find the needle in the haystack!
If you have ever been in a hiring role, what has your experience been like? On the other hand, folks looking for a data scientist role – what are some of the challenges you have faced in your journey? Use the comments section below to let me know!
Mathangi is currently building a Data Science team at PhonePe. She has 13+ years of proven track record in building world-class data sciences solutions and products. She has extensively worked on building chatbots and productizing text mining insights. She has 6 Patent grants and 20+ patents pending in the area of inuitive customer service,indoor positioning and user profiles. She is adept across machine learning , text mining NLP technologies & tools.
Could not agree more, brilliant article for hiring the right people.
man could u please suggest me where to join data science Msc in India where placements would be there?
Great article. This mirrors my experience and your points/questions are bang on.