Kaggle Grandmaster Series – Exclusive Interview with 2x Kaggle Grandmaster Firat Gonen

Analytics Vidhya Last Updated : 30 Dec, 2020

9 min read

Where do you start when you want to learn how to ace data science competitions?

You reach out to the elite. You try and learn from the best of the best. The data science experts who have scaled the hackathon ladder and tasted success first hand.

In short, you learn from the Grandmasters themselves. We are thrilled to present the new “Kaggle Grandmaster Series” where we interview top Kagglers from around the globe to bring their thoughts, insights, and experience in front of the Analytics Vidhya community.

In this first interview, we are joined by Firat Gonen who is a Kaggle Notebooks AND Discussions Grandmaster! That’s right – we are thrilled to host a 2X Grandmaster who will share his experience and knowledge with us.

Firat brings his 10+ years of knowledge in experimental methodology, Visual Attention / Perception, Decision-Making & Genetic Algorithms, Computational Neuroscience, Neural Networks, Machine Learning, AI & Fundamental Engineering, Big Data tools to this interview.

He also holds extensive education qualifications with Bachelor’s, Master’s, and Ph.D. degree in Electrical Engineering.

Here’s a gem from Firat:

“In order to achieve any step in any domain in Kaggle, you need a lot of patience.” – Firat Gonen

There are a lot more elite advice and knowledge packed in this interview so read on!

In this interview, we cover a range of topics, including:

Firat Gonen’s Background and Break into Data Science
Firat’s Kaggle Journey from Scratch to a 2X Grandmaster
Advice for Beginners on How to Ace Data Science Competitions!

Firat Gonen’s Background and Break into Data Science

Data Science Interview Firat Gonen - University of Houston

Analytics Vidhya (AV): You’ve done your MS and Ph.D. with a specialization in Neural Networks and Neuro Engineering. When and how did you realize that you want to pursue data science? And how did your degrees help in that transition?

Firat Gonen (FG): When I was a bachelor student, I wasn’t aware of the “Data Science” field, perhaps the world wasn’t using these terms so broadly; AI, Data Science, Machine Learning!

During my junior and senior years in college in Istanbul, I joined a MEMS laboratory focusing on building pico-laser projectors. I started spending quite some time with my seniors in the lab and impressed by their work, I wanted to continue in this field. After starting my master’s program focused on optoelectronics in Houston / Texas, I was introduced to NeuroScience, brain imaging, MRI, and visual attention. I was dazzled and determined that this was my field. I pulled the plug, left optics, and switched labs.

I remember my 1st lecture related to this field: Neural and Cognitive Modeling. After this, I was hooked.

I was in a complex world of math, biology, anatomy, statistics, and medicine. Learning more and more over time, I was amazed at the rich history of this field. I still remember my advisor Professor Haluk Ogmen teaching us early perceptrons, Rosenblatt, Minsky – Papert’s studies. We were learning early studies and findings in the lectures and back in the lab, we were designing our own experiments and mastering statistics for them.

AV: That’s quite an amazing background. You’ve also participated in the development of real-time 3-D scanner technology. Would you tell us more about that and your role in the development?

FG: This was more than 10 years ago! That was my senior year project in the MEMS Lab. I was trying to build a real-time 3D scanner using a laser input and a generic web-cam. I remember developing it in Matlab back then. It was a nice introduction to signal processing, Kalman filters, matriculations, etc.

AV: In one of your Independent research projects on Neuro Engineering and Cognitive Science, you came up with a hypothesis on real-time data science algorithms. That’s something our community would love to know more about.

FG: One of my experiments during my Ph.D. program was based on human decision–making algorithms and whether we could model it using an eye tracker and an early version learner. This was an interesting experiment and gave me the title “Dr.”

The detection property that we possess as mammals facilitates active exploratory behaviors; making contingency detection is an essential part of human intelligence and behavior. How to sample the environment and make decisions using the sampled environment are foundational issues in perception and cognition.

Source: Atlassian

At that time, several models explained human perception and decision-making as a means to optimize a given criterion. Yet, several studies studying perception, cognition, and decision-making concluded that human behavior differs greatly from decision-making models. For example, according to statistical theory, humans are expected to maximize their sampling in order to decide.

However, frequently humans choose small samples over large samples and present higher confidence in their decisions. A general understanding of perceptual and cognitive processes is not possible until we understand why we prefer small samples compared to large ones. It might be “quick gut decisions”, fatigue, opportunity costs, and limited short-term memory discovered that there is a relation between the sample size used to make a decision and working memory capacity. Studies favoring small samples over large samples remained questionable back then since they did not possess a firm background.

“More recently a statistical decision framework has been proposed in which small samples surpass large samples (Small Sample Advantage, SSA) in decision-making in detecting stimulus contingencies. In other words, humans do not seek to maximize the number of samples but instead purposefully keep it small. Our goal was to understand how perceptual and cognitive processes operated in real-time in a natural dynamic scene.”

AV: Currently, you’re the head of Data Science at Getir. Here’s a question a LOT of people would love to see – how has participating in hackathons and competitions helped you in your professional career?

getir | getir bi mutluluk

FG: 5 months ago, I joined Getir as the Head of Data Science & Analytics. It is the perfect place for a data scientist. A beautiful marriage between retail and technology.

“I can honestly say I learned a lot from each competition and each domain helped me to build a business acumen over the years. I believe domain knowledge is very important and competitions are the perfect environment to learn this.”

I don’t know any alternative to this, where else can one deep dive into NLP one month and then struggle with earthquake data?

Firat’s Kaggle Journey from Scratch to a 2X Grandmaster

AV: You hold the title of Kaggle Double Grandmaster – Discussion Grandmaster and Notebook Grandmaster. Along with these, you’re also a Dataset master and a Competition Expert. That is a seriously impressive portfolio! What are some of the key challenges you had to overcome to reach this incredible stage?

FG: I’ve been Kaggling for more than 2 years now, and each step takes time. There is a learning curve for each domain and each of them is very difficult.

I think the most obvious challenge is the very first start to Kaggle. I usually see a lot of people opening their accounts trying a few things and then leaving. I think this has 2 reasons:

1st reason is that they struggle to create time in their daily routine
2nd reason is that they expect fast results

In order to achieve any step in any domain in Kaggle, I think you need a lot of patience. There are several good writings in Kaggle about how to start Kaggling and/or detailed experiences of veteran Kagglers. I highly recommend newcomers read those.

I think my biggest challenge was also similar: dedicating the time. It’s not that easy to dedicate the required amount of attention and time along with private life and a career.

AV: Do you follow a specific framework when starting a new competition? Or do you prefer a more laissez-faire style where you soak in the problem first and then decide where to start?

FG: Several Kaggle Competition Grandmasters suggest that creating an end-to-end pipeline, even though it’s a simple one, would help a lot. I need to follow that advice I guess. I am usually a laissez-faire guy.

“I started reading more and more discussions before jumping to code and I think this really helps. I now am a very good reader and I can clearly say that it helps.”

AV: That’s solid advice, Firat. Reading is definitely a very under-rated aspect of succeeding in data science. Most people tend to focus on Kaggle competitions. Why did you choose to enter the discussion and notebook aspect of Kaggle?

FG: Actually, I like to believe that I tried to balance it over the tiers. When I became a discussion Grandmaster, I already had 4 competition medals placing me in the top 1000, or when I achieved my second Grandmaster title (notebooks) I already had my 5th competition medal. And I already achieved a mastership in datasets.

“I really love the idea that Kaggle is actually a huge community and, sharing ideas or resources helps a lot. Notebooks and Discussions tiers are enforcing us to help each other and show great ideas or methodologies.”

Like in every online community or forum, the majority of Kagglers are novices and new starters. They need good resources and you can’t achieve that by competing only. You can see that several high-ranking Kagglers share a lot of great stuff whether this is in notebooks or discussions.

AV: We love the way you structure your Notebooks. It’s easy to see how you became a Notebooks Grandmaster after going through your work. What are some tips for creating a well-structured notebook?

FG: There are several great notebooks in Kaggle and they are built in very different ways or aims. Some of them help a lot during competitions, some do good in specific aspects like time series forecasting or BERT. Several of them helps you a lot in EDAs.

Some people spend weeks in notebooks, some hours. Several notebooks are forked by thousands, some make you achieve a gold medal in a competition.

I guess one needs to understand this, check them all out, and decide on their own. The only common thing between them is that they are built to help, and that’s what matters.

“My way was to keep it simple, short, and very easy to understand in order for a complete beginner to read, understand, and learn new stuff, that’s it!”

Firat Gonen’s Advice for Beginners

Data Science Interview Firat Gonen - Beginner

AV: If someone wants to become a Kaggle Grandmaster, where should they begin and what should they focus on?

FG: I am proud to be a Kaggle Grandmaster but the goal shouldn’t be to become one!

“They should be focusing on learning, sharing, discussing. If they have a goal of becoming an expert in a specific field like computer vision, NLP, etc., that’s really good, and focus on that.”

I really like seeing when a new Kaggler begins his/her journey and becomes really experienced in one particular domain, starts sharing, and gets rewarded with a Kaggle rank. So, in short, the focus should be on the experience.

AV: Kaggle definitely helps in building a strong data science profile. Having said that, what is the key to become a good Data Scientist and how else can a beginner expand his/her profile?

FG: Good question! Kaggle is a great place to build a strong data science profile.

“Apart from that, a good Data Scientist needs to have a great strong background in several fields like linear algebra, probability, statistics, computer science fundamentals, and coding.”

After the fundamentals, it would feel much easier to dive into Machine Learning and Statistical Learning. Depending on their company, distributed systems & big data tools can become handy.

Once one becomes accustomed to technical aspects, he/she needs to focus on business understanding and should try to understand complex conventional business models. Over the years I learned that business insight, good judgment, quick decision making in your own business domain are as important as being able to create great Machine Learning pipelines.

End Notes

Wow – what a great interview and a sparkling start to our Kaggle Grandmaster Series! Firat’s analytical approach to answering is something out of the ordinary. I hope this interview will help you to set your course right and rise up the data science leaderboard rankings!

Let us know in the comments if you have any other questions that you think we missed. You can also drop any questions you feel you want to ask a future interviewee – we’d love to focus on your thoughts as well!

Analytics Vidhya

Analytics Vidhya Content team

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Kaggle Grandmaster Series – Exclusive Interview with 2x Kaggle Grandmaster Firat Gonen

Where do you start when you want to learn how to ace data science competitions?

In this interview, we cover a range of topics, including:

Firat Gonen’s Background and Break into Data Science

Analytics Vidhya (AV): You’ve done your MS and Ph.D. with a specialization in Neural Networks and Neuro Engineering. When and how did you realize that you want to pursue data science? And how did your degrees help in that transition?

AV: That’s quite an amazing background. You’ve also participated in the development of real-time 3-D scanner technology. Would you tell us more about that and your role in the development?

AV: In one of your Independent research projects on Neuro Engineering and Cognitive Science, you came up with a hypothesis on real-time data science algorithms. That’s something our community would love to know more about.

AV: Currently, you’re the head of Data Science at Getir. Here’s a question a LOT of people would love to see – how has participating in hackathons and competitions helped you in your professional career?

Firat’s Kaggle Journey from Scratch to a 2X Grandmaster

AV: Do you follow a specific framework when starting a new competition? Or do you prefer a more laissez-faire style where you soak in the problem first and then decide where to start?

AV: That’s solid advice, Firat. Reading is definitely a very under-rated aspect of succeeding in data science. Most people tend to focus on Kaggle competitions. Why did you choose to enter the discussion and notebook aspect of Kaggle?

AV: We love the way you structure your Notebooks. It’s easy to see how you became a Notebooks Grandmaster after going through your work. What are some tips for creating a well-structured notebook?

Firat Gonen’s Advice for Beginners

AV: If someone wants to become a Kaggle Grandmaster, where should they begin and what should they focus on?

AV: Kaggle definitely helps in building a strong data science profile. Having said that, what is the key to become a good Data Scientist and how else can a beginner expand his/her profile?

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM