I’m a big fan of hackathons. I’ve learned so much about data science from participating in these hackathons in the past few years. I’ll admit it – I have gained a lot of knowledge through this medium and this, in turn, has accelerated my professional career.
This comes with a caveat – winning a data science hackathon is really hard. Just think about the number of obstacles in your way:
A single decimal point could be the difference between the top 10 and the top 50. Isn’t this why we love hackathons in the first place? The thrill of seeing our hard work pay off with a rise in the leaderboard rankings is unparalleled.
So, we’re thrilled to bring to you the top 3 winning approaches from the Innoplexus Sentiment Analysis hackathon! You are going to be awestruck by how these three top data scientists thought through their solutions and came up with their own unique framework.
There is a LOT to learn from these approaches. Trust me, take the time to go through the steps and understand where they came from. And then think if you would have done anything differently. And then – go ahead and take part in these hackathons yourself on our DataHack platform!
So let’s begin, shall we?
It’s always an exciting prospect, hosting hackathons with our partner Innoplexus. Each time they come up with problem statements that are based on Natural Language Processing (NLP), an immensely popular field right now. We have seen huge developments in NLP thanks to transfer learning models such as BERT, XLNet, GPT-2, etc.
And sentiment analysis is one of the most common NLP projects data scientists tend to work on. This Innolpexus hackathon was a 5-day contest with more than 3200 data scientists across the globe competing for job opportunities and exciting prizes offered by Innoplexus.
It was a hard-fought contest with a total of 8000+ submissions and a variety of approaches employed by the best in the business to occupy the top spots.
For those of you who could not make it to the top, or otherwise could not find time to work on the problem, we have collated the winners’ approach and solutions to help you appreciate and learn from these. So here goes.
There are a lot of components that go into building the narrative of a brand. It isn’t just built and controlled by the company that owns the brand. Think about any big brand you are familiar with and you’ll instantly understand what I’m talking about.
For this reason, companies are constantly looking out across various platforms, such as blogs, forums, social media, etc. for checking the sentiment around their various products and also competitor products to learn how their brand resonates in the market. This analysis helps them in various aspects of their post-launch market research.
This is relevant for a lot of industries, including pharma and their drugs.
But this comes with several challenges. Primarily, the language used in this type of content is not strictly grammatically correct. We often come across people using sarcasm. Others cover several topics with different sentiments in one post. Other users post comments to indicate their sentiment around the topic.
Broadly speaking, sentiment can be clubbed into 3 major buckets – Positive, Negative and Neutral Sentiments.
In the Innoplexus Sentiment Analysis Hackathon, the participants were provided with data containing samples of text. This text could potentially contain one or more drug mentions. Each row contained a unique combination of the text and the drug mention. Note that the same text could also have different sentiments for a different drug.
Given the text and drug name, the task was to predict the sentiment for texts contained in the test dataset. Given below is an example of text from the dataset:
Example:
Stelara is still fairly new to Crohn’s treatment. This is why you might not get a lot of replies. I’ve done some research, but most of the “time to work” answers are from Psoriasis boards. For Psoriasis, it seems to be about 4-12 weeks to reach a strong therapeutic level. The good news is, Stelara seems to be getting rave reviews from Crohn’s patients. It seems to be the best med to come along since Remicade. I hope you have good success with it. My daughter was diagnosed Feb. 19/07, (13 yrs. old at the time of diagnosis), with Crohn’s of the Terminal Illium. Has used Prednisone and Pentasa. Started Imuran (02/09), had an abdominal abscess (12/08). 2cm of Stricture. Started Remicade in Feb. 2014, along with 100mgs. of Imuran.
The above text is positive for Stelara and negative for Remicade. Now that we have a solid understanding of what the problem at hand was, let’s dive into the winning approaches!
As I mentioned earlier, winning a hackathon is extremely difficult. I loved going through these top solutions and approaches provided by our winners. First, let’s look at who won and congratulate them:
Here are the final rankings of all the participants on the Leaderboard.
The top 3 winners have shared their detailed approach from the competition. I am sure you are eager to know their secrets so let’s begin.
Here’s what Mohsin shared with us:
“My final solution is an ensemble of BERT and XLNet runs.”
Here’s what Harini shared with us:
“My final model was an ensemble of 3 BERT and 1 AEN.”
Here’s what Melwin shared with us:
“I noticed pretty early that increasing the max sequence length increased the score sufficiently. This observation more or less dictated my approach. I used a basic XLNet model with hardly any feature engineering.”
It was great fun interacting with these winners and getting to know their approach during the competition. This is a tightly contest hackathon and as you have already seen, the winning approaches were supremely awesome.
I encourage you to head over to the DataHack platform TODAY and participate in the ongoing and upcoming hackathons. It will be an invaluable learning experience!
Is it possible to share the solution as well?
The codes are included now at this link: https://github.com/kunalj101/Innoplexus_sentiment_analysis_top_solutions
Thanks for sharing winning approaches, is it possible to share the code as well?
The codes are included now at this link: https://github.com/kunalj101/Innoplexus_sentiment_analysis_top_solutions
Thanks for sharing the approaches…is there any GitHub repo likewise for a different dataset which we can explore
The codes are included now at this link: https://github.com/kunalj101/Innoplexus_sentiment_analysis_top_solutions