“During a competition, the difference between a top 50% and a top 10% is mostly the time invested”- Theo Viel
2021 is here and the story of the majority of budding data scientists trying to triumph in Kaggle Competitions continues the same way as it used to. They invest less time and give up way too early.
Theo Viel is someone whom beginner level Kagglers should look up to if you find yourself getting frustrated quickly. In this 12th edition of the Kaggle Grandmaster Series, Theo joins us to share his deep learning and NLP journey and his Kaggle experience!
Theo is a Kaggle Competitions Grandmaster and holds 30th rank with 6 gold medals. Also, he holds the Master title for Notebooks and Discussions category in Kaggle. He started his Kaggle journey 2 years ago and holds focused on Deep Learning competitions.
He recently completed his Master’s Degree in Applied Mathematics. Since then he has been working as a Deep Learning Researcher for a French startup called DAMAE Medical, where Theo uses his skills to build models on skin-related problems (cancer detection, segmentation).
You can go through the previous Kaggle Grandmaster Series Interviews here.
Theo Viel(TV): I started my NLP journey 2 years ago when I found an internship where I worked on sentiment analysis topics. I had no experience at the time and was hoping to find an internship in one of the two dominating fields in Deep Learning (NLP and Computer Vision). At this time, Deep Learning for NLP consisted mostly of Recurrent Neural Networks based models, which were a good place to start.
This was also around this time I discovered Kaggle and entered the Quora Insincere Question Classification competition since it was close to my internship topic. My internship ended in December 2018 but I kept competing in NLP competitions since, even though the methods have changed a lot, with the uprising of transformers.
TV: All you really need is to start a Kaggle NLP competition in the HuggingFace library, but most people use it already.
TV: For Kaggle competitions, I try to keep up to date with the transformer literature. But I have to admit that once again the HuggingFace library covers more than enough to perform well.
Regarding my professional career, the work I do involves keeping updated with the state of the art, so I read a lot of papers related to my topics of interest. This is true for every field in Machine Learning I guess.
TV: I learned most of my Deep Learning skills by myself during my internships or during Kaggle competitions, but I already had a good mathematical background. Deep Learning was the logical continuation of my studies, as I liked (and was good at) maths and programming.
Having an Applied Mathematics degree also influences the way I reason. I always look for an explanation of how things work, and why things don’t work. In Deep Learning, most of the things you observe make sense, therefore good reasoning will help you a lot when experimenting.
TV: It really depends on the country you live in. I live in France so I can only reply to its specific case. I think almost everybody here who starts a career in Deep Learning has at least a Master’s degree. Having a Ph.D. symbolizes excellence, but is not at all necessary to do 99% of the data science-related jobs.
If you already feel sharp enough after your Master’s to start working, there is no real need for a Ph.D., as recruiters are fine with MDs. Although, if you want to do fundamental research in a specific field, or teach Deep Learning, then doing a Ph.D. is viable. Also, top tier ML labs don’t consider candidates that do not have a (solid) Ph.D.
TV: During a competition, the difference between a top 50% and a top 10% is mostly the time invested. The competitions I do not end up in the medal range are the ones I didn’t really work on and wasn’t really able to beat the baselines.
Then, grabbing a medal is mostly about beating the baselines which can be done with enough experimenting, and with a good understanding of the problem.
The jump from to top 1% is a bit more complicated, I believe the things that played the most for me are :
TV: I only enter Deep Learning competitions, for which I know my hardware will not be too much of a bottleneck. NLP competitions are ideal because it is faster to train a text model than an image model. But I mostly enjoy computer vision competitions, as I found them to be more interesting.
Before entering a competition, I often lurk in the forums to get familiar with the topic and to see if I have some interesting ideas worth trying.
TV: These are the steps I usually go through when approaching a Kaggle competition :
I also think working in a team adds a lot of value. I would advise starting to work alone, push the results as far as you can, and then try to merge with people that have roughly the same score as you.
Regarding the real-world approach, I have a job that is actually close to what I do when doing a Kaggle competition. Only steps 2 to 5 apply though, and what I do is guided by what the product needs instead of by the leaderboard. You might hear a lot that Kaggle is nowhere close to what people do in their jobs, that is not really my case.
For me, Text-to-speech and NLP are two very different things. Text-to-speech is closer to audio processing than text processing (NLP). Text-to-speech is an interesting topic but I think it does not have enough applications to become the next “big” thing.
However, NLP is a much more promising field as its applications are numerous. Nowadays, becoming good with NLP is almost equivalent to being good in Deep Learning. Being able to build a robust NLP pipeline (with transformers, or RNNs) is a good skill to have, and something I believe not a lot of people can do.
His passion for NLP is clearly what is helping him progress in it and we hope such interviews help you in spiking up your passion.
This is the 12th interview in the Kaggle Grandmasters Series. You can read the previous few in the following links-
What did you learn from this interview? Are there other data science leaders you would want us to interview for the Kaggle Grandmaster Series? Let me know in the comments section below!