What’s the key to cracking data science competitions? How do you use this experience to break into the data science industry? We regularly come across these questions from aspiring data scientists wondering how to make a name for themselves in data science.
Who better to answer these questions and provide an in-depth insight into the data science world than a Kaggle Master and a Analytics Vidhya hackathon expert? Ladies and gentlemen, I’m delighted to present Sonny Laskar!
Sonny is a MBA post-graduate from IIM Indore, the place he credits for starting his data science journey. So for any of you wondering if it’s possible to make a career transition to data science from a non-data science field – this article is for you.
I found Sonny to be a very approachable person and his answers, as you’ll soon see, are very interesting, knowledgeable and rich with experience. Despite holding a senior role in the industry, Sonny loves taking part in data science competitions and hackathons and regularly scales the top echelons of competition leaderboards.
Sonny also holds a lot of experience in the data engineering side of this field. As you can imagine, there is a LOT we can learn from him. I had the opportunity to pick his brain about various data science topics and bring this article to you.
And a whole lot more! There is SO much to learn from Sonny’s knowledge and thought process. Enjoy the discussion!
Sonny Laskar: My Data Science journey started when I was pursuing my MBA from IIM Indore. Analytics was the go-to area for every aspirant. One of the early topics of discussions was based on how Target figured out a teen girl was pregnant before her father did. This made me very curious and I started to deep dive into the world of Data Science.
I had already worked extensively with data but mostly around engineering problems and business intelligence. No serious machine learning stuff was popular back then with organizations in India.
“I spent two months at the University of Texas, Austin in early 2014 and was surprised by the level of maturity they had with data. My visit to Dell’s headquarters in Austin and how they used social media data to enhance their product positioning was amazing. By the end of this, I was completely convinced that I needed to work on this.”
SL: I started my career in 2007 in the world of IT Infrastructure. In the initial six years, I was primarily working on building massive scale data warehousing applications (processing ~10TB data every). The focus was more on ETL and BI. Dashboards and Data marts were the primary output of all these efforts. This was what we called “Descriptive Analytics”.
By 2014-15, “Predictive Analytics” was already getting a lot of attention and adoption in the US. It was then that many organizations in India started looking at “Predictive Analytics” with significant focus. We were already processing Terabytes of data and were very well versed with the engineering side of things.
I was able to understand the fundamentals of Data Science very well since my Mathematics and Statistics concepts are strong and I had a fair exposure to programming.
I started with R since that was the programming language popular in academics and improved my understanding by practicing writing code and replicating other work.
During my MBA, I got a bird’s eye view of many statistical and Data Science approaches. Since the focus during MBA was more on business, it didn’t allow me to master the technical skills as much as the industry needs. Post my MBA, I started spending roughly 4-5 hours every day writing code and building on top of it.
I have already written enough code in the past in Bash, Javascript, PHP & Perl. So, the learning curve was not very steep for me. I also invested in getting access to cloud subscriptions so that I could play with large volumes of data. I think it’s worth investing that money when you believe it is going to be helpful in the long term.
Patience, Perseverance & Practice has been my thumb rule for everything in life, which was what I applied here as well.
SL: Data Science is getting a lot of attention from the workforce in the market. It is in fact very easy to get some training to understand the basic concepts (thanks to MOOCs). This leads to excessive supply and recruiters then need some ways to filter.
One of the best ways that work is establishing credibility by participating in data science competitions.
Just like most things in life, competitions have their pros & cons. There is a lot of preparatory work that gets done before a competition is published. That work is at times extremely complex, time-taking and needs multi-domain understanding.
Similarly, the competition ends with a leaderboard score without any view on what was done with the winners’ solutions. These are grey areas for many first-timers into Data Science which creates a lot of issues when they join the industry.
I have conducted at least 100 in-person interviews in the last year and I can see this struggle very prominently. Data Scientists are not expected to just design a machine learning model to predict something. In many organizations, discussions in meeting rooms end up with a task for the Data Scientist such as “Let us build a model to predict X”.
A good Data Scientist might end up concluding that many such X use cases should not be solved at all with machine learning! A Data Science team is not expected to be very large in the real world. They might get involved in many tasks which are either not valuable or can be easily solved without using Machine Learning.
If they feel it can be solved with Machine Learning, then there must be a series of discussions to understand what data would help them address that.
“Unlike competitions, nobody gives you two .csv files called train and test and a nicely written evaluation metric. Almost 80% of the efforts go into defining the problem and getting and processing data. Remaining 20% effort goes into pure modeling and deployment.”
Exposure to competitions helps address a few parts of this:
These are very significant activities and hence recruiters use “competitions” as a good filter to focus on a smaller set of candidates.
To summarize, below are the key issues which competition focused people face when they join the industry:
SL: I was hooked to data science competitions back in 2016. I used to participate in as many competitions as I could! Lately, my personal interest has kind of plateaued as incremental learning has diminished. Now I participate only if I have time and a very interesting problem.
I also try to participate in offline hackathons along with my Kaggle Grandmaster friend Sudalai Rajkumar (SRK). I usually participate based on three factors:
SL: As a beginner, it is important for folks to know the basic building blocks.
“I would strictly advise that they should not participate in any competition where the data set is large, and the problem statement is complex.”
They should start with relatively easy data science competitions. Below is what aspiring data scientists should do in the initial few weeks:
SL: As we participate in many competitions, we realize that there are a common set of steps that we always follow. We should try to create a template out of it which we can easily modify in every competition. This makes life simpler.
I follow the below process:
SL: Interesting question. Here is what I would recommend focusing on:
SL: AutoML will eventually automate most of the model building & model deployment part of the work. This will include dealing and working with feature engineering (to quite an extent).
“Importance of domain knowledge, logical reasoning, and having a problem-solving attitude is all that Data Scientist would be expected to excel at.”
Other key trends that I see:
SL: There are too many to list down! But here are my top 3 picks:
SL: I use Xgboost & Lightgbm for most of my tasks. They work almost every time. For deep learning, Keras with TensorFlow seems perfect to me.
SL: Sudalai Rajkumar (SRK) any day!
SL: Here are a few tips from my experience:
I thoroughly enjoyed interacting with Sonny Laskar for this interview. His knowledge, his thought process and the way he articulates and structures his thoughts is something we can all learn from.
What did you learn from this interview? Are there other data science leaders you would want us to interview? Let me know in the comments section below!
Thanks! it really helps me a lot and it clear all my doughts towards the roles and responsibility towards the data science as a beginner I don't know what exactly needs to learn where should I spend my time all things got clear with this article once again thanks.
Hi Aman, Glad to know that you enjoyed the interview! Yes, this was quite an insightful discussion with regards to what aspiring data scientists should know and what they should expect.
amazing! Keep those expert inputs keep coming in, helps a lot.
Thanks , very inspirational .