Congratulations on choosing data science as your future career! It’s a great decision.
Data science is a thriving field with a remarkable number of job openings around the globe. The demand is outstripping the supply! That means there are more vacancies than qualified data science professionals.
So this journey you have taken to become a hands-on data science professional? You can already visualize why it’s the path to future success. There are a variety of problems you can solve, a whole host of tools you can master, and a broad range of techniques you can learn and then play around with.
And here’s even better news – you are already in a much better place than most candidates! As someone who is already in a data-based role, you hold a massive advantage over almost anyone trying to transition into data science because of the following reasons:
The canvas is in front of you – now it’s your turn to pick up the data science brush and start painting your way to a successful data science transition.
A hands-on data science role is little bit of programming, a little bit of statistics, a pinch of business domain knowledge and a whole lot of forming and understanding the problem statement
Data science may be the sexiest job of the 21st century but like all jobs, even this one requires hard work. A day-to-day hands-on role in data science requires working on the same problem for long hours performing continuous in-depth research. This role requires you to be well-versed with probability and statistics, programming, machine learning.
A data science role requires you to be in continuous communication with the stakeholders as well as other teams. Om the soft skills side, you’d want to keep up on your communication skills, storytelling skills, and structured thinking ability. We’ll talk about these skills in a moment.
A typical data science project lifecycle looks like this:
Depending on your role, your project, and your organization, you’ll be working on different stages. Some projects require a data scientist to do the end-to-end work. Most projects will expect you to be involved from the start but will leave the data collection and model deployment stages to data engineers. It all comes down to specific use cases.
Since you are already working in a data-based role (whether that’s MIS/reporting/business intelligence), you should be familiar with the “first half” of the above lifecycle. You would have worked till the data exploration stage – and now need to make the leap to the data modeling phase.
Data science is a multi-faceted role. There is no one-size-fits-all approach to learning data science. Having said that, there are a few core skills you will need to pick up to make a successful career transition to data science.
Here are the key skills you would need:
Apart from these core skills, there are other skills you should be aware of, such as:
Ah, the key question! Now that you know what you need to learn, the attention turns to how you can learn those skills. Let’s look at a few options and suggestions on how to pick up and hone the key skills we mentioned above.
Machine Learning has seen a great jump only because of the boost in computing power. Programming provides us a way to communicate with machines. Do you need to become the best in programming? Not at all. But you will definitely need to be comfortable with it.
First of all, choose the programming language of your choice. Python, R, or Julia are to name a few and each has its own set of Pros and Cons. Python is a general-purpose programming language having multiple data science libraries along with rapid prototyping whereas R is a language for statistical analysis and visualization. Julia offers the best of both worlds and is faster. If you are confused about which language to choose, we have compiled a resourceful article for you:
Python is the market leader right now and continues to be widely used in the industry. It’s a lot easier to perform machine learning tasks using Python, due to the availability of libraries and high support for deep learning.
Statistics is the grammar of data science.
When you start learning to write sentences, you must be familiar with grammar to build the right sentences similarly statistics is an essential concept before you can produce high-quality models. Machine Learning starts out as statistics and then advances. Even the concept of linear regression is an age-old statistical analysis concept.
The knowledge of the concept of descriptive statistics like mean, median, mode, variance, the standard deviation is a must. Then come the various probability distributions, sample and population, CLT, skewness and kurtosis, inferential statistics – hypothesis testing, confidence intervals, and so on.
Statistics is a MUST concept to become a data scientist. You can deep dive into some of these concepts with these clear articles and their examples:
For a data scientist, machine learning is the core skill to have. Machine learning is used to build predictive models. For example, you want to predict the number of customers you will have in the next month by looking at the past month’s data, you will need to use machine learning algorithms.
You can start with a simple linear and logistic regression model and then move ahead to advanced ensemble models like Random Forest, XGBoost, CatBoost, and so on. It’s a good thing to know the code for these algorithms (which just takes 2-3 lines) but what’s most important is to know how they work. This will help you in hyperparameter tuning and ultimately a model that gives a low error rate.
If you are looking for specialization, Natural Language Processing (NLP) and Computer Vision are two fields that are absolutely thriving right now. Each requires you to dive deep into those specific fields so make sure you’re aware of what you’re getting into.
Structured thinking is a process of putting a framework to an unstructured problem. Having a structure not only helps an analyst understand the problem at a macro level, but it also helps by identifying areas that require deeper understanding.
Without structure, an analyst is like a tourist without a map. He might understand where he wants to go (or what he wants to solve), but he doesn’t know how to get there. He would not be able to judge which tools and vehicles he would need to reach the desired place.
How many times have you come across a situation when the entire work had to be re-done because a particular segment was not excluded from data? Or a segment was not included? Or just when you were about to finish the analysis, you come across a factor you did not think of before? All these are results of poorly structured thinking.
As a hands-on data science professional, you’ll be working a LOT with databases. You will need them to extract your data, extract subsets, and extract samples.
Hence, having hands-on knowledge of databases is essential. The most common database language you should pick up is SQL.
SQL is a must-have skill for every data science professional. You should start from the basics of databases and structured query language (SQL) and learn about everything you would need in any data science profession, including Writing and executing efficient Queries, Joining multiple tables, and appending and manipulating tables.
Data Science projects are more of a treasure hunting job, the treasure being the insights you fetch from the data. The question is what is the price of the treasure? Well, that is decided by your stakeholders. The only way to get a good price is to be able to communicate how insightful the results and how can this treasure help them in improving the profits and organization.
This is where dashboarding comes in. A lot of data science transitioners ignore the dashboarding aspect because they focus on model building. But being able to communicate your thoughts and your key results to the stakeholder – that’s what separates a good data scientist from an amateur one.
Spending time on understanding what dashboarding is and how it works will give you a huge advantage.
Whatever we have covered so far has a lot to do with understanding different data science concepts. We’ve covered both the technical side (programming, machine learning, statistics, etc.) and the soft skills aspect (structured thinking).
So, what’s the next step for you in your transition journey?
It’s time to apply your knowledge in a practical scenario! Yes, you need to marry your theoretical knowledge with hands-on practical experience to truly stand out as a data science transitioner. Given your background, the best (and easiest) way to do this is to apply your learnings in your current data-based role.
There are broadly three ways you can do this.
Don’t just limit yourself to generating insights based on visual interpretations. Take a look at the image below – what’s your first reaction?
I can say that the average business sourced, post the contest, is higher as compared to before. Now, the question is whether “contest is the factor behind the boost in average business sourced or is it just a random increase?”. Here, we need to rely on certain statistics concepts to support our insights, like doing a z-test/t-test or other statistical tests. Having a good knowledge of statistics will help you in these situations.
You should have a solid understanding of the below statistics topics if you want to land a data science role:
And here is a list of useful resources to help you get started with these topics:
Performing detective and statistical analysis will not help you land a data science role if you don’t share your findings with the right group.
Presenting stories is one of the key skills a data science professional must possess.
Here, I strongly recommend practicing this storytelling skill in your current role as well. You can start with the following:
Here’s an essential recommendation that has personally helped me in my career – add visualization(s) to your slide(s). The words you write in the presentation (or speak during a meeting) should add context to your visualizations.
After building your model, you should share the results with your supervisor or the people who make decisions (like the team or project manager). As a data science professional, it is very critical to share your findings (like which feature(s) is making an impact on the target variable). You should also communicate regular updates around the comparison between your model result and the actual numbers.
This process will also help you to tune and improve your model. If the model is performing well, then there is a high chance you will get another assignment or get involved with the core data science team. That’s what we are aiming for, right?
If you would rather look for a role outside your current organization, then what are some of the things you can do?
This is another essential aspect of working in data science. We’ve seen the majority of transitioners skip this step and focus exclusively on picking up machine learning concepts – don’t do that!
Data science is still a very nascent field. We see major breakthroughs happening on a regular basis (sometimes a weekly basis!) and it can become difficult to keep up with all that’s happening. But if you can find time to catch up on the latest developments, you’ll already have an edge on your competition.
Let us give you an example. The Natural Language Processing (NLP) field has come a long way in the last 3 years (since 2017). We see a new language model seemingly every week that builds on the last major breakthrough. If you can keep up with this pace, if you can spend a bit of time understanding what’s going on, you’ll gain invaluable knowledge that your peers won’t have.
So what are the different ways in which you can stay up to date in the vast space of data science? Here are three suggestions based on our experience:
Making a career switch to data science for getting a salary bump is entirely justified. However, it isn’t as straightforward as you might think. There are certain things, such as work experience and your current domain, that will play a MASSIVE role in deciding your salary post-transition.
Taking figures from the popular and relatively accurate website called Glassdoor, this is what the salary situation looks like for a data scientist:
As you can see, in India, the average salary in 2020 is approximately INR 10,00,000 per year. Whereas, the average salary for the same in the USA is $134,000 per year.
If you bring a bit more experience to the table and you have relevant domain experience, you might look at a more senior role:
As we said, it comes down to how relevant your previous experience is. Your relevant experience will definitely come in handy to move from the first graph to the second.
There has never been a better time to become a data scientist. Data Science is a booming industry but it also comes with its own set of challenges. Keeping in mind that you already have an edge with your experience in the data science industry, there are a few challenges to look out for. If you have reached here, we know you can work out through obstacles. Let’s take them up one by one –
Now that you are aware of the various components you’ll need to put together to make this career transition, are you prepared to buckle up and take this thrilling journey? The payoff is immense but as you might have gathered, you’ll face plenty of obstacles along the way. Your eventual success will come down to how well you can get past these hurdles.