“My transition from SWE to Data Science/AI is still not complete; I am working on it every day.”
Being a Kaggle Grandmaster in any category is a function of daily practice. Such iteration can only ensure the sharpening of your skills and make you industry-ready.
In the 13th edition of the Kaggle Grandmaster Series, we have Peter Pesti joining us.
Peter is a Kaggle Notebooks Grandmaster and currently ranks 23rd with 15 gold medals to his name. He also holds a Master title in the Discussion category and an Expert title in the competitions category.
Further, Peter completed his Master’s in Computer Engineering from Veszprémi Egyetem. For the past 4 years, he has been a Self-Employed software engineer.
You can go through the previous Kaggle Grandmaster Series Interviews here.
Peter Pesti: In the past 18-20 years, I have worked as a Software Engineer/Software Developer for many different companies. It was a great 20 years; I have worked with many excellent developers/engineers, and I have learned a lot during these years. We’ve made various software from small mobile applications to large scale ERP systems.
Honestly, I worked too much. About 4-5 years ago, there were periods when I sat in front of my monitor 13-14 hours every day. Not surprisingly, I burned out. I knew it was time to take a break and do something else.
At that time, I read about AlphaGo Zero. After I read that article, I was amazed at the level of A.I. The same day I bought my first Machine Learning Course on Udemy. I did not know what I was jumping into. The first course led to another, etc. Since then, I’ve learned a lot about ML/DL/AI. I’ve finished many online courses on Udemy, Udacity, Coursera, Edx.
My transition from SWE to Data Science/AI is still not complete; I am working on it every day.
PP; I think there are two main types of challenges when you want to write successful notebooks.
The first is a technical one. You have to write the code. When I wrote my first Kaggle Kernel, I made lots of rookie mistakes even though I had 20 years of programming experience. At first, it takes more time to write a simple notebook. You will make bugs; it will hard to read; the formatting will be messy, etc. If you don’t give up and practice, you’ll be better and better with time. You will have more and more reusable code, and eventually, you will be able to publish GM level kernel within an hour.
For me, the more difficult challenge was a non-technical one. When I first published it, I thought I wrote an excellent notebook, and I got only a few votes. Timing and a bit of marketing are as much important as the code itself. For example, if you write an EDA kernel for a competition weeks before the end, you won’t get any votes.
The easiest way to get enough votes is during competition. On the other hand, your competition for those votes is much higher. After my first few failures, I looked for ideas; what type of notebooks achieved gold level? EDA, starter kernels, etc. So I wrote an EDA, which I published within hours after the competition started. The notebook was great (in my opinion), but SRK posted his own EDA (much better than mine, of course). As a beginner, you can’t compete with a GM, even if your work’s quality is identical. The reason is simple: most of the readers will read the GM’s notebook, not yours. I was aiming for the votes and a gold medal; that was also a mistake. I should have looked for bronze kernels. It is much easier to get five votes for a bronze medal.
After these failures, I wrote an explanation kernel for some scoring metric Kaggle used for an ongoing competition. There was a bit of a misunderstanding about how they calculated. I thought it would be useful for others too, I did not expect that kind of success. I gave more than 100 votes and my first gold medal.
PP: EDA Kernels. In my opinion, data analyzing skills are a must for every Data Scientist. But before that, every beginner needs good coding skills. You don’t have to be a Python expert, but you will need a bit of confidence to write the necessary python codes. You will also need to learn a few libraries, like Numpy, Pandas, and a plotting tool, Matplotlib or Plot.ly, for example. After you have these basics, you can start writing your own EDA notebooks.
PP: Most of the shared content on the discussion forums are related to some competition. For a beginner, for educational purposes, these topics are not useful. One exception may be the write-up topics. Lots of teams share their solution after the competition ends.
I think online courses are much more valuable for a beginner. There are many MOOC and e-learning sites from which you can choose. Udemy, Coursera, Udacity, Edx, KhanAcademy, Kaggle, and Analytics Vidhya, to name a few; Infinite possibilities.
On the other hand, if you participate in a competition, the discussion forum is a must. When I enter a new competition, I subscribe to the discussions, and I read all of the topics and comments from day one to the end.
PP:
PP:
PP: It is challenging to keep up with everything; so much is happening in the industry every day. I think the only way if you choose a small part and focus on that. After I learned the basics in many areas (Computer Vision, NLP, Time series, Reinforcement Learning, etc.) I started to focus on computer vision. I read a lot about this topic every day. If I have time, I read some new papers on arxiv. Besides my job, Kaggle, learning, and my personal life, I don’t have much time left, but I do my best. I keep hundreds of open tabs on chrome 🙂
PP: Applying deep learning is still rare, at least in Hungary. Slowly, but it seems it started something. The government takes this industry more seriously. They funded a few exciting projects. Bigger companies posted more and more open positions.
It is hard to predict, but I think the change will be dramatic in the next 10-20 years.
PP: Learn, learn, learn. This industry is changing so fast, so what you’ve learned at a university won’t be enough five or ten years from now.
Building a professional online portfolio is a great way to show your knowledge. Be active on Kaggle; your goal should be at least one GM title. Three is better 🙂 Open source coding is also a good starting point.
PP: There are so many incredible people; it is hard to name just a few. If I am looking for great EDA kernels, I always check SRK, Heads or Tails, or Chris Deotte.
On the discussion forum, I read everything that Heng, CPMP, or Andrew share.
Outside of Kaggle, I am trying to read as many blogs and articles as I could. I read everything from Google Brains, Facebook AI, OpenAI, Uber. Occasionally, when I have a bit more time, I read interesting, new arxiv publications.
Peter’s journey is a testament to the fact that you have to work on your knowledge and base every day to be a data scientist. I hope this interview gives you some important lessons for you to apply in your personal journey.
This is the 13th interview in the Kaggle Grandmasters Series. You can read the previous few in the following links-
What did you learn from this interview? Are there other data science leaders you would want us to interview for the Kaggle Grandmaster Series? Let me know in the comments section below!