Exclusive Interview with 2x Kaggle Master Gilles Vandewiele!

Analytics Vidhya Last Updated : 19 Nov, 2020

7 min read

Introduction

“I think one of the nice things about the data science field is that it is so multi-disciplinary and that anyone who aspires to become a data scientist can do so.” – Gilles Vandewiele

Golden words!

As a beginner in data science, this quote gives me a lot of hope provided that I, like many other data science aspirants, don’t come from a scientific or technical background. And when it comes to people like us, looking up to someone’s journey to learn from is really important.

To ease the process, we are excited to bring to you an exclusive interview with Gilles Vandewiele. He is a 2X Kaggle Master in both the Competitions and Discussions categories.

He has already won 3 Gold Medal Competitions this year. He actively participates in Kaggle discussions where he helps others based on his experiences and learnings. He’s the perfect community presence to learn from!

Also, Giles is a Ph.D. student in Machine Learning at the Internet and Data Science Lab (IDLab) research group in the Department of Information Technology (INTEC) at Ghent University. There he is conducting research in the domain of white-box machine learning for critical domains and (semantic) knowledge models.

This is a highly insightful interview for beginners in Data Science. So take it all in and enjoy your journey!

Gilles Vandewiele’s Education

Ghent University - SPOTTERON Citizen Science

Himanshi Singh (HS): We have many members in the community who want to shift from Computer Science to the Data Science field. They would want to know from you, how did you transit from CS Engg to DS?

Gilles Vandewiele(GV): My transition was rather smooth as I started a Ph.D. in Machine Learning at IDLab (Ghent University) directly after finishing my master’s degree in CS engineering there.

I think one of the nice things about the DS field is that it is so multi-disciplinary and that anyone who aspires to become a data scientist can do so. Of course, some degrees, such as CS and mathematics, do make this transition easier but it is definitely not a requirement to become a data scientist.

HS: You mentioned one of your research topics are white box ML models, especially for critical domains. What is white box ML and why should other people learn more about it?

GV: We typically make a distinction between white and black box ML models. White box models are techniques that are inherently interpretable such as decision trees, linear regression, and Bayesian networks. On the other hand, we have black box models that are very difficult to explain, such as neural networks. While there are techniques out there that can highlight why certain techniques make a specific prediction, such as SHAP, these techniques are only able to give local, instance-based explanations and it is impossible to fully grasp the internals of the model.

In critical domains, where decisions have significant consequences (e.g. law, health, and finance), it is of key importance that ML techniques support the expert in making decisions instead of making the decisions for them. This importance is being increasingly recognized, as we are seeing a surge in the domain of eXplainable AI (xAI).

Gilles’s Kaggle Journey from Scratch to becoming a Master

Interview Gilles Vandewiele - Kaggle

HS: Can you describe your Kaggle journey from the beginning till now in a few points?

GV: I got to know Kaggle in my final master year, 5 years ago, as part of a project of a Machine Learning course in which we had to recognize traffic signs. I am a very competitive person and remember that I spent a lot of time on that project as I wanted to end up high on the leaderboard. While the result was not that great (only finished 20th out of 31 teams), I did learn a lot.

I then Kaggled on and off over the next two years, mostly joining playground competitions to hone my skills. It is only around 2019 that I started Kaggling on a frequent basis by continuously participating in competitions, one at a time. I achieved a Kaggle expert roughly 10 months ago and a Kaggle master status, 5 months ago. 2020 was a good year for me, as I was able to win 3 Competition Gold Medals.

HS: We hear all the time how real-world applications are different from hackathons. What’s your experience with applying your hackathon knowledge in the real world?

GV: Projects on Kaggle and in the real world definitely have some differences at first sight, but have more similarities than one would think at closer inspection. In real-world projects, a lot of time and work needs to be invested in the earlier and later steps of a typical data science pipeline (such as data collection, data cleaning, model visualization, …). While a data scientist should have some experience in each of the steps of such a pipeline, we cannot expect everyone to be an expert in all of those steps. Therefore, I think Kaggle is the ideal place to hone your skills in the modeling and analysis part of the pipeline. Even more so than most real-world projects. The main reason why Kaggle is a better learning environment than the real world is that your boundaries are pushed further by other competitors: you want to end up high in competition and thus create a solution that is better than the other solutions (which are often 1000s of them); in the real world, you create a solution that fulfills the need of the clients and then you are done.

HS: You’re currently ranked 12 as Discussion Master. What are some of the key takeaways from your discussion journey that helped you in your data science career?

GV: Actively reading and participating in discussions helped me to better understand many different subjects: you learn new things by reading other people’s posts and you better understand the things you know once you have to explain them to others.

HS: I also noticed that you have created very intuitive write-ups to understand the model that you have built. Could you share some tips on how to explain the solution to a problem?

GV: It is definitely not easy to create a good write-up, and definitely something I can improve myself further on. Nevertheless, it is a very important skill as a data scientist to explain your solutions to people with all kinds of backgrounds. I typically start out with a schematic drawing of my solution, which helps to structure my post and also gives me an overview of the components that need to be discussed. I then spend more attention on the components that I struggled with understanding myself and try to explain it in a way that helped me to understand the subject. It can also help to mentally go back in time to before the competition (when you did not know anything about the data & problem) and see whether it would have been possible to understand the post at that time. You could also ask a friend to check your post and see whether they can understand it.

HS: Most people tend to focus on the competitions on Kaggle, why did you choose to enter the discussion aspect of Kaggle?

GV: I never focused on the discussion aspect solely. All of my discussions are made in the context of competitions in which I participated myself.

But I do spend a reasonable amount of my time on discussions as it also helps me learn a lot from it. I think some of the most valuable learning experiences on Kaggle are made in a team, as you learn from others. Similarly, discussing ideas on the forum helps to understand the problem and the data at hand.

HS: Is there a specific framework that you use for the discussions or hackathons? Our community would love to hear our ideas and how you approach these problems.

GV: I wish I could say that I have a nice structured approach and workflow for all of my competitions, but I am a very chaotic person. I make a lot of copies of the same notebook with small changes and my competition directory quickly becomes a huge mess. If there is one piece of advice that I can give, is that it is of key importance to iterate very fast during these competitions. You need to set up a pipeline quickly and make some simplifications to it that can increase its efficiency while not sacrificing too much of its performance. After that, a lot of different ideas need to be implemented in a trial-and-error fashion. While implementing these ideas, it is important to keep an overview of what ideas did and did not work. In the end, all of the working ideas can then be integrated into the pipeline.

Gilles’ Advice for Beginners in Data Science

Interview Gilles Vandewiele - Advice for Beginners

HS: Given how much discussions have helped you, what would you suggest for beginners? Where should they begin and what should they focus on?

GV: This will perhaps sound cliche, but my main piece of advice would be to “not hold back”. When you start Kaggling, you should not care about your results but rather about how much you learn. I sometimes hear from others that they do not want to participate on Kaggle because they are afraid of ending up badly on the leaderboard. I think that’s a big mistake.

One other piece of advice that I would like to give is to “never take shortcuts” or “game” the system. We sometimes see malpractices in the notebook, discussion, and dataset tiers where people spam others on LinkedIn for upvotes, or just plagiarize other people’s work. This will never pay off in the long term.

HS: Are there any areas apart from hackathons that you feel people should focus on to build their profile?

GV: In order to build your own profile, personal branding is important. Definitely share your achievements from hackathons on different social media. Also, blog posts (e.g. with write-ups of your solutions) help to reach people that do not participate in hackathons. Finally, a website or live CV is also a good thing. I would suggest making that as early as possible so that you can extend it over time.

End Notes

I thoroughly enjoyed interacting with Gilles Vandewiele via this interview. He has a clear structure to his thoughts and his enthusiasm to share his experience is something all beginners will benefit from.

This is the third interview in the series of Kaggle Interviews. You can read the first 2 interviews here-

What did you learn from this interview? Are there other data science leaders you would want us to interview? Let me know in the comments section below!

Analytics Vidhya

Analytics Vidhya Content team

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Exclusive Interview with 2x Kaggle Master Gilles Vandewiele!

Introduction

Gilles Vandewiele’s Education

Himanshi Singh (HS): We have many members in the community who want to shift from Computer Science to the Data Science field. They would want to know from you, how did you transit from CS Engg to DS?

HS: You mentioned one of your research topics are white box ML models, especially for critical domains. What is white box ML and why should other people learn more about it?

Gilles’s Kaggle Journey from Scratch to becoming a Master

HS: Can you describe your Kaggle journey from the beginning till now in a few points?

HS: We hear all the time how real-world applications are different from hackathons. What’s your experience with applying your hackathon knowledge in the real world?

HS: You’re currently ranked 12 as Discussion Master. What are some of the key takeaways from your discussion journey that helped you in your data science career?

HS: I also noticed that you have created very intuitive write-ups to understand the model that you have built. Could you share some tips on how to explain the solution to a problem?

HS: Most people tend to focus on the competitions on Kaggle, why did you choose to enter the discussion aspect of Kaggle?

HS: Is there a specific framework that you use for the discussions or hackathons? Our community would love to hear our ideas and how you approach these problems.

Gilles’ Advice for Beginners in Data Science

HS: Given how much discussions have helped you, what would you suggest for beginners? Where should they begin and what should they focus on?

HS: Are there any areas apart from hackathons that you feel people should focus on to build their profile?

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM