Bring it on! Analytics Vidhya Author identification challenge

Kunal Jain Last Updated : 18 Apr, 2015
3 min read
What is the best form of analytics learning? Applying it to practical problems! This is exactly what led us to create this interesting problem, solving which would be a lot of fun! The challenge should be a good combo of basic text mining and predictive modeling. If you haven’t got your hands dirty learning these techniques, now is the time to do it!
Microsoft-predictive-analytics

Background

It’s family time for Kunal and Tavish! Both of them are on a break and have decided to stay away from any email / phone communications. Other Analytics Vidhya team members are not only filling in their shoes, but also have their own tasks to be completed!
Meet Navnit, our Tech Lead and web developer – who has decided that this is probably the best time to change the CMS (content management system) for our site. He creates a backup of data in our database, moves it over to a DVD and starts the migration.
Due to a mismatch in data models – a few fields got lost in the transition. Navnit, realized this only after he has deleted the entire data on previous CMS. When he realized this, he thought – nothing to worry, he has the backup, he can put up the lost fields through the DVD back. He checks that the DVD is on his desk and plans to restore the data first thing tomorrow morning.
jenika
Enter Jenika, Kunal’s lovely year old daughter – who feels that entire Analytics Vidhya office, is her playground. During her visit, she comes across this shiny blue disc, which she has not seen before! Nice toy, dad has in his office – she might have thought! In the hour, she had before Navnit is back in office, she tried eating her new toy, sliding it on the the ground and what not!
Poor Navnit, his only source of missing fields can not be used now! He checked and the last backup was taken on 6th July 2014,11:59 p.m. On comparing, one of the most critical fields lost is the author name. So, he can not identify the author for articles posted after 6th July 2014. He decides to finally learn and apply some predictive analytics for his work!
It’s your turn to help Navnit get back the data, before Kunal or Tavish are back in office (1st September 2014)!

Problem statement:

Classify all the articles written by Tavish or Kunal on analyticvidhya.com by the author’s name.
What Data you need to use for training your model?
All the articles written by Tavish and Kunal before 7th July 2014 can be used to train the model. You can use the date of article publish, day of article publish, tags of article publish and the content of the article. The data needs to be scrapped out of the website and used on local server.
What Data you need to score your model?
All the articles (excluding this article) written by Tavish and Kunal after 6th July 2014 need to be scored using your model.
Help : You can take reference from this video to start this analysis
What is the evaluation metric?
Average mis-classification rate of both training and scoring will be taken as the evaluation metric. For example 5 out of 10 in scoring and 50 out of 50 in training were found to be correct classes in training and scoring respectively. The average mis-classification rate will be 0.5 * (5/10 + 0/50) = 0.25. Hence, your score is 75%. You need to build model which has high predictive power and also stable over populations.

End Notes:

  1. The aim of this challenge is to foster analytical thinking in our reader’s mind and have some fun with practical machine learning / analytics challenges!
  2. We will give the winner of this challenge a chance to blog about his solution on Analytics Vidhya. Of course, he takes away all the visibility, which comes on the platform!
  3. Last but not the least, the entire story presented before is hypothetical. It was created with the sole aim to create this challenge. All our data is secure and darling Jenika understands that she can’t play around with Dad’s stuff in office!

Happy learning!

Bonus:

If you want to foster discussion on any aspect of this problem, please feel free to do this through comments below. This is your chance to engage in community learning!

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Kunal Jain is the Founder and CEO of Analytics Vidhya, one of the world's leading communities of Al professionals. With over 17 years of experience in the field, Kunal has been instrumental in shaping the global Al landscape. His expertise spans diverse markets, from developed economies like the UK to emerging ones like India, where he has successfully led and delivered complex data-driven solutions. As a recognized thought leader, Kunal has empowered countless individuals to realize their Al ambitions through his visionary approach to Al education and community building. Before founding Analytics Vidhya, Kunal earned both his undergraduate and postgraduate degrees from IIT Bombay and held key roles at Capital One and Aviva Life Insurance across multiple geographies. His passion lies at the intersection of analytics, Al, and fostering a thriving community of data science professionals.

Responses From Readers

Clear

Ashvi
Ashvi

Hi I am a final year B.com(H) student at SRCC and I want to build my career in analytic. I really like your website a lot. Its a great platform for a fresher like me to get a kick start to my career! :) For solving this problem, you said that we need to prepare a model on a programming tool "R". I have never worked on this tool before. in fact never even heard of it. can you please provide me with any information about the basics to use the tool. I am a beginner in analytic so please bear with me! Regards Ashvi Mittal

Manju
Manju

Hi Kunal, Its interesting. Do we have any timeline for completion? Regds, Manju

Sandeep
Sandeep

I am New to the industry, still acquiring knowledge. I really don't know how to solve this problem. But am really excited to discover that to what extent analytics can help!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details