Kaggle Grandmaster Series – Notebooks Grandmaster Mobassir Hossen’s Journey from Software Engineer to Data Science

Analytics Vidhya Last Updated : 30 Dec, 2020

9 min read

“Being dynamic is important, especially in the Data Science field.” – Mobassir Hossen

To achieve your goal of transitioning into data science or becoming a Grandmaster, a lot of dedication and self-learning is required along with the ability to be a dynamic learner. No matter what your background is.

Not sure where to start?

Well, how about hearing from another Kaggle Grandmaster? That’s right – we are proud to present this third installment in our Kaggle Grandmaster Series with Notebooks Grandmaster Mobassir Hossen!

Data Science Interview Mobassir Hossen

Mobassir is a Kaggle Notebooks Grandmaster with a Kaggle rank of #44. He is a Kaggle Discussions Master and Kaggle Competitions Expert as well. Also, he graduated with a Software Engineering Degree from Daffodil International University-DIU and currently works as a Data Scientist at Markopolo.ai.

A journey from software engineering to data science? That’s one a lot of people would love to know more about!

In this interview, we cover a range of topics, including:

Mobassir Hossen’s Transition from Software Engineer to Data Science
Mobassir’s NLP Journey and his advice to NLP enthusiasts
Mobassir’s Kaggle Journey from Scratch to becoming a Grandmaster

So, go through this interview and absorb all you can!

Mobassir Hossen‘s Transition from Software Engineering to Data Science

software_engineering_machine_learning

Analytics Vidhya (AV): You did your Bachelor’s in Software Engineering, then how did you make your transition from SWE to Data Science?

Mobassir Hossen (MH): There are different kinds of fields open for a software engineer and I felt a fascination for a lot of them. So I invested a lot of time working on software security, the Internet of Things, Embedded System design, etc. I was jumping from one ship to another like that and was unable to pick a fixed field for my career.

The main problem was “I felt love for all of those departments and it was hard for me to pick a single one from them for my career”. While reading papers for “Internet of things (IoT)” one day, I found an idea about a project and I thought it would be cool if I can implement that. The idea was to design a system that can detect carbon monoxide (CO) percentage in a room because I had found in some papers that if carbon monoxide percentage crosses a certain limit threshold then it can kill people staying in that room.

So I began searching for a solution that I can try to tackle this problem. Then I heard somewhere people talking about an algorithm called SVM(support vector machine) that can be used to classify CO after taking data from sensors using Arduino. My next search on google was “what is SVM?”.Google told me that it’s a Machine learning algorithm. That is when I came to know about machine learning. I was in my 3rd year’s 1st semester at that time. I started taking machine learning courses to understand algorithms like SVM to solve that IoT problem I had in my head, and somehow I felt extremely addicted to machine learning and started investing too much time learning stuff related to machine learning. This is how I dropped my IoT project and picked Data Science as my career.

AV: Since the Software Engineers already know to program, what are the additional things they should focus on in order to do this transition?

MH: Software Engineers know programming required for Software 1.0 whereas data science demands programming skills for software 2.0. In software engineering, we had statistics and mathematics and that helped in my transition. Other than this, I think thee following points also play a role in your transition from SWE toDS:

Adaptability: Data Science demands different coding skills and analytical skills compared to software engineering and to achieve all these we need to “practice” these new skills a lot as we did during our undergraduate for solving software 1.0 related problems.
Dynamic learner: I think most of the good software engineers are having this skill already and this skill helps in the data science field too,.you will have to do a lot of google search, read SOTA papers and keep up to date with recent works in the data science field, a lot of people don’t do this but I think it is a very important skill to have.
I have seen a lot of top-notch software engineers that don’t want to learn data science because this field is changing day by day and that’s why I think “we need to be a dynamic learner”.I mean this DS field is changing day by day, we need to get updated with the latest research and a lot of software engineers don’t want to do this hence they don’t decide to pick DS.

Mobassir‘s Interest and Experience in Healthcare

Data Science Interview Mobassir Hossen - Healthcare

AV: I noticed you’re interested in healthcare startups. What specifically do you look for when you look for machine learning use cases in healthcare startups?

MH: Here are some of the points I especially look for when I look for machine learning use cases in healthcare startups:

Early detection and prediction of a particular disease with the aim of “saving lives”
A lot of diseases like pneumothorax detection is really very challenging and even radiologist with years of experience can make mistakes so I plan to design a tool for assisting radiologist/doctors by providing smart data-driven solutions
I look for a way to assist doctors and reducing “wrong treatment” or false negative, false positive score so that we can save lives, a lot of people die daily because of wrong treatment(like marking a patient safe at an earlier stage of his/her disease but later it costs a life because the doctor made mistake as we all human do)

AV: Can you suggest any good datasets or competitions where people interested in healthcare can participate?

MH: It depends. If someone is willing to solve tabular data problems then:

If someone is willing to solve Radiological/computer vision-related problems then:

For assisting dermatologist one can start with “SIIM-ISIC Melanoma Classification”

Also, Intel & MobileODT Cervical Cancer Screening and APTOS 2019 Blindness Detection are great datasets to explore.

Actually, it depends on the individual’s interest, there are lots of medical data problems. You need to ask yourself “which medical problem you want to solve the most?” and you can start from there. As I said it’s a “dynamic process”. You can start with a problem and realize “well I don’t know much about this problem and also don’t know how to solve this problem through data-driven approaches, but now I am interested”.you can start from having zero knowledge and still end up being a pro. You have almost all the resources required. All you need to do is spend a lot of time googling, reading papers, notebooks, books, etc.

Mobassir‘s Kaggle Journey to Becoming a Grandmaster

kaggle

AV: You’re the first Kaggle Notebooks Grandmaster from Bangladesh, and this definitely would feel great. What were the challenges you faced in this journey?

MH: I still remember how it all started. It took me 77 days to finish this Coursera course on Machine Learning by Andrew Ng when I was a 3rd-year undergraduate student. I became so addicted to Machine Learning that I sacrificed university quizzes, presentations, exams, etc even though I had a high CGPA till then.

Why did I do that? They call it “passion” these days 🙂

When I first started my ML journey, at that time in my university every CS student was busy solving competitive programming problems but I found machine learning very interesting. So I wanted to learn ML but I saw no one around me has even basic ML knowledge. Hence, no one around me could guide me well for machine learning.

During my initial days into ML, the answers I received for my crucial career-related questions were very demotivating.

My question: “I am interested in machine learning. I want to become a Data Scientist. Is it the wrong idea/decision?”

Answer 1: “Mobassir, machine learning is the hottest topic now but what happens if after 10 years some other technology replaces machine learning? What will you do? So do code forces competition”

Answer 2: “in Bangladesh very, very fewer companies work on machine learning problems. You are less likely to get a job with this skill here so learn web/android framework and regularly solve competitive problems only”

Answer 3: “if you don’t have heavy math/statistics knowledge then don’t go for machine learning”

These replies really worried me and led to a lot of self-doubts. But anyway, I signed up in kaggle 2 years ago and became part of a community so diverse and collaborative that there was no looking back from there. Today I am very proud that I rejected all surrounding people’s guidance and followed my own, which was toward “my passion”.

AV: I’m sure you must be participating in the discussions and competitions but how did you end up entering the Notebook aspect of Kaggle and even got the Grandmaster title in that?

MH: When I started my data science journey, I was already having a very busy academic schedule. Consequently, I couldn’t spend much time on Kaggle. I started by “participating in the discussions” and since I was from the SWE background and this helped me learn quickly. I collaborated with people in the discussions forum and later some of them became my good buddies with whom I still compete in Kaggle.

As I said this field demands people with a “dynamic learning attitude”. I have no special talent but I realized that I have a “dynamic learning attitude” and that’s why after so many fluctuations I decided to build my career in the DS field and this leads me to invest a lot of time in the kernels/notebooks section and the GrandMaster title from Kaggle followed. I can assure you that at least 70% of notebooks that I wrote “I started with ZERO knowledge, did a lot of google search and read other’s solution, discussions and by the time I wrote the last lines of those kernels, I knew something” and that is why I think “being dynamic is important, especially for DS field”.

AV: Do you follow certain steps while creating the notebooks? Can you share them?

MH: Yeah, I have learned a few techniques from vastly experienced Kagglers in the past and I try to apply those most of the time. They are as follows:

Don’t print too much log information, if a particular code cell is printing too much log information then there is a possibility that a lot of people won’t read your notebook fully, they just dislike “scrolling for forever”
People come to the DS field from different backgrounds and not everyone has a great coding background so I try to explain what a particular code block is doing with visual graphs or words so that everyone can understand. I see a lot of people will write beautiful code but won’t describe what his/her codes are doing. they just want to show the world that “they can write beautiful codes”
While creating a notebook I check almost all the related notebooks and try to find a gap. If I find a gap or find something that “no one tried yet” then I simply try to implement and bring that in my notebook, it is more like (research and development process)
I always give a reference for contents and codes that I take from elsewhere so there is always a reference section for my works but I see a lot of people don’t do this, sometimes they will simply change variable names, function names, etc and pretend like It is his/her work which is a very bad practice, I keep this in mind while creating notebooks
In data science problems “lack of domain knowledge” is a big issue and each and every data problem asks for different domain knowledge. So in my notebooks sometimes I try to share domain knowledge of particular data problem and I learn them by “googling” and sharing them means saving a lot of times for others ☺
I try to write clean code but sometimes I get messed up

AV: If somebody is starting from scratch and wants to create industry-ready notebooks, can you share five points they should keep in mind?

MH: Clear documentation of each segment in markup and Comments describing why the function is needed rather than what it does

Write reusable components/ avoid code duplication and always use descriptive variable names
Not making the notebook super long/ best to create a local library of the reusable components and call them in the notebook. Transform and save function/routine, classes into .py file and call them from local module
In order to be able to use notebooks not only for rapid prototyping but also for long-term productivity, certain process events must be logged so that, for example, errors can be diagnosed more easily and the entire process can be monitored.
Practice Proper Unit testing

End Notes

Wow! What an inspiring interview that was. Such wise words can only come after a lot of experience.

Mobassir’s journey is a testament to the fact that -one never knows how many doors are open by simply listening to yourself. I hope this interview will help you answer your DS career-related questions more precisely.

This is the third interview in the series of Kaggle Interviews. You can read the first 2 interviews here-

What did you learn from this interview? Are there other data science leaders you would want us to interview? Let me know in the comments section below!

Analytics Vidhya

Analytics Vidhya Content team

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Kaggle Grandmaster Series – Notebooks Grandmaster Mobassir Hossen’s Journey from Software Engineer to Data Science

In this interview, we cover a range of topics, including:

Mobassir Hossen‘s Transition from Software Engineering to Data Science

Analytics Vidhya (AV): You did your Bachelor’s in Software Engineering, then how did you make your transition from SWE to Data Science?

AV: Since the Software Engineers already know to program, what are the additional things they should focus on in order to do this transition?

Mobassir‘s Interest and Experience in Healthcare

AV: I noticed you’re interested in healthcare startups. What specifically do you look for when you look for machine learning use cases in healthcare startups?

AV: Can you suggest any good datasets or competitions where people interested in healthcare can participate?

Mobassir‘s Kaggle Journey to Becoming a Grandmaster

AV: You’re the first Kaggle Notebooks Grandmaster from Bangladesh, and this definitely would feel great. What were the challenges you faced in this journey?

AV: I’m sure you must be participating in the discussions and competitions but how did you end up entering the Notebook aspect of Kaggle and even got the Grandmaster title in that?

AV: Do you follow certain steps while creating the notebooks? Can you share them?

AV: If somebody is starting from scratch and wants to create industry-ready notebooks, can you share five points they should keep in mind?

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B