Exclusive Interview with Sonny Laskar – Kaggle Master and Analytics Vidhya Hackathon Expert

Pranav Dar Last Updated : 30 May, 2019

10 min read

Introduction

What’s the key to cracking data science competitions? How do you use this experience to break into the data science industry? We regularly come across these questions from aspiring data scientists wondering how to make a name for themselves in data science.

Who better to answer these questions and provide an in-depth insight into the data science world than a Kaggle Master and a Analytics Vidhya hackathon expert? Ladies and gentlemen, I’m delighted to present Sonny Laskar!

Sonny is a MBA post-graduate from IIM Indore, the place he credits for starting his data science journey. So for any of you wondering if it’s possible to make a career transition to data science from a non-data science field – this article is for you.

I found Sonny to be a very approachable person and his answers, as you’ll soon see, are very interesting, knowledgeable and rich with experience. Despite holding a senior role in the industry, Sonny loves taking part in data science competitions and hackathons and regularly scales the top echelons of competition leaderboards.

Sonny also holds a lot of experience in the data engineering side of this field. As you can imagine, there is a LOT we can learn from him. I had the opportunity to pick his brain about various data science topics and bring this article to you.

We covered a variety of data science topics during our conversation:

Sonny’s background and his first role in data science
The difference between data science competitions and industry projects
Sonny’s framework and approach to data science competitions
His advice to aspiring data scientists

And a whole lot more! There is SO much to learn from Sonny’s knowledge and thought process. Enjoy the discussion!

Sonny Laskar’s Background and First Role in Data Science

Pranav Dar: You are currently the Associate Director of Automation and Analytics at Microland, finished 4 times in the top 3 in AV’s hackathons, and hold a runner-up finish in a Kaggle competition. It’s been quite a ride!

How and where did your data science journey begin?

Sonny Laskar: My Data Science journey started when I was pursuing my MBA from IIM Indore. Analytics was the go-to area for every aspirant. One of the early topics of discussions was based on how Target figured out a teen girl was pregnant before her father did. This made me very curious and I started to deep dive into the world of Data Science.

I had already worked extensively with data but mostly around engineering problems and business intelligence. No serious machine learning stuff was popular back then with organizations in India.

“I spent two months at the University of Texas, Austin in early 2014 and was surprised by the level of maturity they had with data. My visit to Dell’s headquarters in Austin and how they used social media data to enhance their product positioning was amazing. By the end of this, I was completely convinced that I needed to work on this.”

PD: Your professional career didn’t start off in data science. The first 6 years or so were spent on data warehousing and infrastructure.

So what kind of challenges did you face when you were getting into data science? How did you overcome them?

SL: I started my career in 2007 in the world of IT Infrastructure. In the initial six years, I was primarily working on building massive scale data warehousing applications (processing ~10TB data every). The focus was more on ETL and BI. Dashboards and Data marts were the primary output of all these efforts. This was what we called “Descriptive Analytics”.

By 2014-15, “Predictive Analytics” was already getting a lot of attention and adoption in the US. It was then that many organizations in India started looking at “Predictive Analytics” with significant focus. We were already processing Terabytes of data and were very well versed with the engineering side of things.

I was able to understand the fundamentals of Data Science very well since my Mathematics and Statistics concepts are strong and I had a fair exposure to programming.

I started with R since that was the programming language popular in academics and improved my understanding by practicing writing code and replicating other work.

During my MBA, I got a bird’s eye view of many statistical and Data Science approaches. Since the focus during MBA was more on business, it didn’t allow me to master the technical skills as much as the industry needs. Post my MBA, I started spending roughly 4-5 hours every day writing code and building on top of it.

I have already written enough code in the past in Bash, Javascript, PHP & Perl. So, the learning curve was not very steep for me. I also invested in getting access to cloud subscriptions so that I could play with large volumes of data. I think it’s worth investing that money when you believe it is going to be helpful in the long term.

Patience, Perseverance & Practice has been my thumb rule for everything in life, which was what I applied here as well.

Industry Experience versus Data Science Competitions

PD: We often hear from hiring managers how aspiring data scientists participate in hackathons and competitions and struggle to bridge the gap during their transition into an industry role.

You have been on both sides of this – you hold rich experience in data science and have excelled in hackathons. What has been your experience in the industry vs. hackathon debate?

SL: Data Science is getting a lot of attention from the workforce in the market. It is in fact very easy to get some training to understand the basic concepts (thanks to MOOCs). This leads to excessive supply and recruiters then need some ways to filter.

One of the best ways that work is establishing credibility by participating in data science competitions.

Just like most things in life, competitions have their pros & cons. There is a lot of preparatory work that gets done before a competition is published. That work is at times extremely complex, time-taking and needs multi-domain understanding.

Similarly, the competition ends with a leaderboard score without any view on what was done with the winners’ solutions. These are grey areas for many first-timers into Data Science which creates a lot of issues when they join the industry.

I have conducted at least 100 in-person interviews in the last year and I can see this struggle very prominently. Data Scientists are not expected to just design a machine learning model to predict something. In many organizations, discussions in meeting rooms end up with a task for the Data Scientist such as “Let us build a model to predict X”.

A good Data Scientist might end up concluding that many such X use cases should not be solved at all with machine learning! A Data Science team is not expected to be very large in the real world. They might get involved in many tasks which are either not valuable or can be easily solved without using Machine Learning.

If they feel it can be solved with Machine Learning, then there must be a series of discussions to understand what data would help them address that.

“Unlike competitions, nobody gives you two .csv files called train and test and a nicely written evaluation metric. Almost 80% of the efforts go into defining the problem and getting and processing data. Remaining 20% effort goes into pure modeling and deployment.”

Exposure to competitions helps address a few parts of this:

Processing data and feature engineering
Building different types of models and getting the best score

These are very significant activities and hence recruiters use “competitions” as a good filter to focus on a smaller set of candidates.

To summarize, below are the key issues which competition focused people face when they join the industry:

Building a business acumen for understanding how a problem statement helps the business goals and what data drives that
Having a problem solver attitude
Understanding the software engineering side of production deployment
Story-telling: Ability to communicate the results to non-technical folks

Data Science Hackathons and Competitions

PD: Ever since data science started becoming mainstream in the last 5 years, multiple competitions keep happening across platforms simultaneously. How do you pick and choose which data science hackathon or competition you’ll participate in?

SL: I was hooked to data science competitions back in 2016. I used to participate in as many competitions as I could! Lately, my personal interest has kind of plateaued as incremental learning has diminished. Now I participate only if I have time and a very interesting problem.

I also try to participate in offline hackathons along with my Kaggle Grandmaster friend Sudalai Rajkumar (SRK). I usually participate based on three factors:

The novelty of the problem: If the problem statement is something new to me from an existing or new domain which I might not have enough experience in, I would like to play with the data as it helps me build some perception on that problem/domain
Data size: I love problems where the data size is extremely large. I like the kick I get when I run models on machines with 500 GB RAM and 64 Core processors. It is a lot of fun!
Multiple scheme of approaches: If there are multiple techniques I can experiment with. In fact, our first Kaggle competition needed us to perform both Text Analytics & Image Analytics and a clear way to merge both

PD: How should a beginner go about participating in these data science hackathons? Which kind of competition should they first dip their toes into?

SL: As a beginner, it is important for folks to know the basic building blocks.

“I would strictly advise that they should not participate in any competition where the data set is large, and the problem statement is complex.”

They should start with relatively easy data science competitions. Below is what aspiring data scientists should do in the initial few weeks:

Understand the data well. Do not get directly into running xgb.train
Read about what transformations are effective for your problem & model:
- Example: Does one-hot encode help or numeric labeling is better? Does the column have too many categories? Can we reduce them? Is that numeric field really a number or a category?
Feature Engineering is key and your early learning on feature engineering will come from other people’s code. So, build a practice of reading others’ code line-by-line and replicate it. Ask yourselves questions like why did the author do that, and how does that help?
- Kaggle kernels are an excellent place to read
- On Analytics Vidhya, participants upload their code which beginners should read
Get familiar with the process of building models using different algorithms

PD: How should aspiring data scientists approach a competition?

SL: As we participate in many competitions, we realize that there are a common set of steps that we always follow. We should try to create a template out of it which we can easily modify in every competition. This makes life simpler.

I follow the below process:

Build a naïve base model using all features and basic feature engineering
Record each change and score in an excel sheet to track progress
Do hyperparameter tuning by hand (without spending too much time) to get something decent
Go back to data understanding and rework the features completely
Explore the data, build visual plots to see the patterns, etc.
Read discussions, kernels, etc.
Repeat all these steps

Data Science Industry-Related

PD: What are 3 critical aspects of a data science project which you feel are often overlooked by newcomers?

SL: Interesting question. Here is what I would recommend focusing on:

Taking Models to Production:
- In the real world, taking models to production takes a lot of effort. There are many things that data scientists need to do from a software engineering perspective, like building Docker containers, setting up a CI/CD pipeline, exposing REST APIs for prediction, version control, etc.
Understanding the Importance of SQL:
- SQL is that one thing that every data scientist should learn irrespective of which programming framework they use. SQL is something they would end up using for sure
Learning to write efficient code for Big Data:
- Badly written code might not be a problem when working on a small dataset. But it becomes a show-stopper when we run it against large datasets. Such scenarios can be handled by making changes. For example, if you use “for-loops” in your code, then it can be very slow when it has to iterate over a long list. Instead, use lambda architecture. There are many functional programming guidelines that need to be followed

PD: AutoML is coming up huge in the industry. What are some other trends in data science we can expect to see in the next 2-3 years?

SL: AutoML will eventually automate most of the model building & model deployment part of the work. This will include dealing and working with feature engineering (to quite an extent).

“Importance of domain knowledge, logical reasoning, and having a problem-solving attitude is all that Data Scientist would be expected to excel at.”

Other key trends that I see:

Adoption of Graphs in Machine Learning: Most folks do not use Graph. That’s a travesty! Graphs are such amazing structures for solving many complex problems
Augmented Analytics: Augmented Analytics automates data insight by utilizing machine learning and natural language to automate data preparation and enable data sharing
Autonomous Systems: Autonomous Systems are like Driverless Cars which can take decisions on their own. Reinforcement learning is behind this. One of the products we are building in Microland is for “Autonomous IT” which will replicate what a human does when there is a problem and learn that behavior to replicate it in real time

Rapid Fire Questions: Sonny’s Take on Various Data Science Aspects

PD: Tell us 3 things you have learned working in data science.

SL: There are too many to list down! But here are my top 3 picks:

Domain Knowledge is key
Being “Jack of Many Trades” helps a lot
Always think out-of-the-box

PD: Which is your favorite machine learning/deep learning algorithm and why?

SL: I use Xgboost & Lightgbm for most of my tasks. They work almost every time. For deep learning, Keras with TensorFlow seems perfect to me.

PD: Which data science professional would you pick to take part in a high-stakes data science competition?

SL: Sudalai Rajkumar (SRK) any day!

PD: What’s your advice to people trying to get their first data science role?

SL: Here are a few tips from my experience:

Do not try to learn two languages at the same time. Master any one which you like. Ignore all the news that you hear like “Language X is better than language Y”, etc.
Build a decent Github profile with all the different types of problem you have tried to solve
Take an open problem where you can get data and build some Data Science application around that
Finally, participate in competitions and make it to the top!

End Notes

I thoroughly enjoyed interacting with Sonny Laskar for this interview. His knowledge, his thought process and the way he articulates and structures his thoughts is something we can all learn from.

What did you learn from this interview? Are there other data science leaders you would want us to interview? Let me know in the comments section below!

Pranav Dar

Senior Editor at Analytics Vidhya.Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Aman.Mashetty

Thanks! it really helps me a lot and it clear all my doughts towards the roles and responsibility towards the data science as a beginner I don't know what exactly needs to learn where should I spend my time all things got clear with this article once again thanks.

Show 1 reply

Hi Aman, Glad to know that you enjoyed the interview! Yes, this was quite an insightful discussion with regards to what aspiring data scientists should know and what they should expect.

Himanshu

amazing! Keep those expert inputs keep coming in, helps a lot.

Pulkit Mehta

Thanks , very inspirational .

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Exclusive Interview with Sonny Laskar – Kaggle Master and Analytics Vidhya Hackathon Expert

Introduction

We covered a variety of data science topics during our conversation:

Sonny Laskar’s Background and First Role in Data Science

Pranav Dar: You are currently the Associate Director of Automation and Analytics at Microland, finished 4 times in the top 3 in AV’s hackathons, and hold a runner-up finish in a Kaggle competition. It’s been quite a ride!

How and where did your data science journey begin?

PD: Your professional career didn’t start off in data science. The first 6 years or so were spent on data warehousing and infrastructure.

So what kind of challenges did you face when you were getting into data science? How did you overcome them?

Industry Experience versus Data Science Competitions

PD: We often hear from hiring managers how aspiring data scientists participate in hackathons and competitions and struggle to bridge the gap during their transition into an industry role.

You have been on both sides of this – you hold rich experience in data science and have excelled in hackathons. What has been your experience in the industry vs. hackathon debate?

Data Science Hackathons and Competitions

PD: Ever since data science started becoming mainstream in the last 5 years, multiple competitions keep happening across platforms simultaneously. How do you pick and choose which data science hackathon or competition you’ll participate in?

PD: How should a beginner go about participating in these data science hackathons? Which kind of competition should they first dip their toes into?

PD: How should aspiring data scientists approach a competition?

Data Science Industry-Related

PD: What are 3 critical aspects of a data science project which you feel are often overlooked by newcomers?

PD: AutoML is coming up huge in the industry. What are some other trends in data science we can expect to see in the next 2-3 years?

Rapid Fire Questions: Sonny’s Take on Various Data Science Aspects

PD: Tell us 3 things you have learned working in data science.

PD: Which is your favorite machine learning/deep learning algorithm and why?

PD: Which data science professional would you pick to take part in a high-stakes data science competition?

PD: What’s your advice to people trying to get their first data science role?

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid