You reach out to the elite. You try and learn from the best of the best. The data science experts who have scaled the hackathon ladder and tasted success first hand.
In short, you learn from the Grandmasters themselves. We are thrilled to present the new “Kaggle Grandmaster Series” where we interview top Kagglers from around the globe to bring their thoughts, insights, and experience in front of the Analytics Vidhya community.
In this first interview, we are joined by Firat Gonen who is a Kaggle Notebooks AND Discussions Grandmaster! That’s right – we are thrilled to host a 2X Grandmaster who will share his experience and knowledge with us.
Firat brings his 10+ years of knowledge in experimental methodology, Visual Attention / Perception, Decision-Making & Genetic Algorithms, Computational Neuroscience, Neural Networks, Machine Learning, AI & Fundamental Engineering, Big Data tools to this interview.
He also holds extensive education qualifications with Bachelor’s, Master’s, and Ph.D. degree in Electrical Engineering.
Here’s a gem from Firat:
“In order to achieve any step in any domain in Kaggle, you need a lot of patience.” – Firat Gonen
There are a lot more elite advice and knowledge packed in this interview so read on!
Firat Gonen (FG): When I was a bachelor student, I wasn’t aware of the “Data Science” field, perhaps the world wasn’t using these terms so broadly; AI, Data Science, Machine Learning!
During my junior and senior years in college in Istanbul, I joined a MEMS laboratory focusing on building pico-laser projectors. I started spending quite some time with my seniors in the lab and impressed by their work, I wanted to continue in this field. After starting my master’s program focused on optoelectronics in Houston / Texas, I was introduced to NeuroScience, brain imaging, MRI, and visual attention. I was dazzled and determined that this was my field. I pulled the plug, left optics, and switched labs.
I remember my 1st lecture related to this field: Neural and Cognitive Modeling. After this, I was hooked.
I was in a complex world of math, biology, anatomy, statistics, and medicine. Learning more and more over time, I was amazed at the rich history of this field. I still remember my advisor Professor Haluk Ogmen teaching us early perceptrons, Rosenblatt, Minsky – Papert’s studies. We were learning early studies and findings in the lectures and back in the lab, we were designing our own experiments and mastering statistics for them.
FG: This was more than 10 years ago! That was my senior year project in the MEMS Lab. I was trying to build a real-time 3D scanner using a laser input and a generic web-cam. I remember developing it in Matlab back then. It was a nice introduction to signal processing, Kalman filters, matriculations, etc.
FG: One of my experiments during my Ph.D. program was based on human decision–making algorithms and whether we could model it using an eye tracker and an early version learner. This was an interesting experiment and gave me the title “Dr.”
The detection property that we possess as mammals facilitates active exploratory behaviors; making contingency detection is an essential part of human intelligence and behavior. How to sample the environment and make decisions using the sampled environment are foundational issues in perception and cognition.
At that time, several models explained human perception and decision-making as a means to optimize a given criterion. Yet, several studies studying perception, cognition, and decision-making concluded that human behavior differs greatly from decision-making models. For example, according to statistical theory, humans are expected to maximize their sampling in order to decide.
However, frequently humans choose small samples over large samples and present higher confidence in their decisions. A general understanding of perceptual and cognitive processes is not possible until we understand why we prefer small samples compared to large ones. It might be “quick gut decisions”, fatigue, opportunity costs, and limited short-term memory discovered that there is a relation between the sample size used to make a decision and working memory capacity. Studies favoring small samples over large samples remained questionable back then since they did not possess a firm background.
“More recently a statistical decision framework has been proposed in which small samples surpass large samples (Small Sample Advantage, SSA) in decision-making in detecting stimulus contingencies. In other words, humans do not seek to maximize the number of samples but instead purposefully keep it small. Our goal was to understand how perceptual and cognitive processes operated in real-time in a natural dynamic scene.”
FG: 5 months ago, I joined Getir as the Head of Data Science & Analytics. It is the perfect place for a data scientist. A beautiful marriage between retail and technology.
“I can honestly say I learned a lot from each competition and each domain helped me to build a business acumen over the years. I believe domain knowledge is very important and competitions are the perfect environment to learn this.”
I don’t know any alternative to this, where else can one deep dive into NLP one month and then struggle with earthquake data?
FG: I’ve been Kaggling for more than 2 years now, and each step takes time. There is a learning curve for each domain and each of them is very difficult.
I think the most obvious challenge is the very first start to Kaggle. I usually see a lot of people opening their accounts trying a few things and then leaving. I think this has 2 reasons:
In order to achieve any step in any domain in Kaggle, I think you need a lot of patience. There are several good writings in Kaggle about how to start Kaggling and/or detailed experiences of veteran Kagglers. I highly recommend newcomers read those.
I think my biggest challenge was also similar: dedicating the time. It’s not that easy to dedicate the required amount of attention and time along with private life and a career.
FG: Several Kaggle Competition Grandmasters suggest that creating an end-to-end pipeline, even though it’s a simple one, would help a lot. I need to follow that advice I guess. I am usually a laissez-faire guy.
“I started reading more and more discussions before jumping to code and I think this really helps. I now am a very good reader and I can clearly say that it helps.”
FG: Actually, I like to believe that I tried to balance it over the tiers. When I became a discussion Grandmaster, I already had 4 competition medals placing me in the top 1000, or when I achieved my second Grandmaster title (notebooks) I already had my 5th competition medal. And I already achieved a mastership in datasets.
“I really love the idea that Kaggle is actually a huge community and, sharing ideas or resources helps a lot. Notebooks and Discussions tiers are enforcing us to help each other and show great ideas or methodologies.”
Like in every online community or forum, the majority of Kagglers are novices and new starters. They need good resources and you can’t achieve that by competing only. You can see that several high-ranking Kagglers share a lot of great stuff whether this is in notebooks or discussions.
FG: There are several great notebooks in Kaggle and they are built in very different ways or aims. Some of them help a lot during competitions, some do good in specific aspects like time series forecasting or BERT. Several of them helps you a lot in EDAs.
Some people spend weeks in notebooks, some hours. Several notebooks are forked by thousands, some make you achieve a gold medal in a competition.
I guess one needs to understand this, check them all out, and decide on their own. The only common thing between them is that they are built to help, and that’s what matters.
“My way was to keep it simple, short, and very easy to understand in order for a complete beginner to read, understand, and learn new stuff, that’s it!”
FG: I am proud to be a Kaggle Grandmaster but the goal shouldn’t be to become one!
“They should be focusing on learning, sharing, discussing. If they have a goal of becoming an expert in a specific field like computer vision, NLP, etc., that’s really good, and focus on that.”
I really like seeing when a new Kaggler begins his/her journey and becomes really experienced in one particular domain, starts sharing, and gets rewarded with a Kaggle rank. So, in short, the focus should be on the experience.
FG: Good question! Kaggle is a great place to build a strong data science profile.
“Apart from that, a good Data Scientist needs to have a great strong background in several fields like linear algebra, probability, statistics, computer science fundamentals, and coding.”
After the fundamentals, it would feel much easier to dive into Machine Learning and Statistical Learning. Depending on their company, distributed systems & big data tools can become handy.
Once one becomes accustomed to technical aspects, he/she needs to focus on business understanding and should try to understand complex conventional business models. Over the years I learned that business insight, good judgment, quick decision making in your own business domain are as important as being able to create great Machine Learning pipelines.
Wow – what a great interview and a sparkling start to our Kaggle Grandmaster Series! Firat’s analytical approach to answering is something out of the ordinary. I hope this interview will help you to set your course right and rise up the data science leaderboard rankings!
Let us know in the comments if you have any other questions that you think we missed. You can also drop any questions you feel you want to ask a future interviewee – we’d love to focus on your thoughts as well!