7 Amazing NLP Hack Sessions to Watch out for at DataHack Summit 2019

Sneha Jain Last Updated : 09 Jan, 2020

8 min read

Picture a world where:

Machines are able to have human-level conversations with us
Computers understand the context of the conversation without having to be told what the subject is
These machines can even write full-blown essays after being given the theme of the topic

This isn’t a movie script or a futuristic scenario – this is all happening right now thanks to the power of Natural Language Processing (NLP)! Here’s the incredible rise charted by Google Trends in the last decade:

Awesome, right? I honestly feel the number of breakthroughs happening in this field is unparalleled. The past two years have been a blur – the Transformer architecture, introduced in 2017, has truly transformed the NLP space.

From the super-efficient ULMFiT framework to Google’s BERT, NLP is truly in the midst of a golden era. Are you ready to be part of this revolution?

Then join us at DataHack Summit 2019, India’s largest applied Artificial Intelligence and Machine Learning conference between 13-16 November 2019 at the NIMHANS Convention Center in Bengaluru!

Reserve Your Seat TODAY!

I am sure you are already eager to learn more about these latest NLP frameworks. So why wait? Let me take you through the exciting hack sessions we have in store for you presented by top NLP experts.

Hack Sessions are one of the most in-demand and popular features of DataHack Summit. They are essentially hour-long live interactive coding sessions presented by the top data scientists from around the globe – a dream for all machine learning professionals!

Here’s the List of Power-Packed NLP Hack Sessions at DHS 2019

Comparison of Transfer Learning Models in NLP
Synthetic Text Data Generation using RNN based Deep Learning Models
Identifying security vulnerabilities in software using Deep Transfer Learning for NLP
Deep Learning for Search in E-Commerce
Intent Identification for Indic Languages
Interpreting State-of-the-Art NLP Models
Automatic Subtitle Generation using NLP and Deep Learning

Comparison of Transfer Learning Models in NLP by Sudalai Rajkumar (SRK)

Have you come across the term Transfer Learning yet? If you haven’t, you need to get up to speed quickly! Almost every breakthrough happening in NLP and computer vision utilizes transfer learning to democratize it for the masses.

You need hundreds of GBs of RAM to run a super complex supervised machine learning problem. The state-of-the-art NLP frameworks like Google’s BERT, OpenAI’s GPT2, TransformerXL, XLNet, etc. are excellent in theory – but they require a ton of compute power. Not everyone has access to GPUs!

This is where transfer learning has been a game-changer. It has enabled us to use the latest NLP frameworks on our local machines, without having to shell out money on GPUs and computational resources.

In this hack session, our eminent speaker and a leading data scientist Sudalai Rajkumar will guide you to compare the performance of these different pre-trained models for NLP along with pre-trained word vector models on text classification tasks. It’s going to be one incredible hack session!

Key Takeaways from this Hack Session

Build pre-trained word embedding models for text classification
Use state-of-the-art NLP models like BERT, XLNet, and XLM
Fine-tune pre-trained language models for text classification tasks

Here are a few resources I recommend going through to brush up your transfer learning and NLP knowledge:

Intent Identification for Indic Languages by Krupal Modi

मुझे बुखार है, मेरा शरीर गर्म है – If you understand Hindi, you would have instantly understood what the meant. The two sentences convey the same meaning. But making this distinction is incredibly difficult for machines.

In fact, one of the biggest challenges in NLP right now is building models for non-English languages. We felt we really had to include this topic at DataHack Summit 2019.

In the booming age of smart devices, accurately detecting the intent of the user from natural language utterance is one of the fundamental problems to be solved in order to truly move from clicks to conversations.

This hack session by Haptik’s Director of Machine Learning Krupal Modi will focus on solving this problem for low resource languages.

Key Takeaways from this Hack Session

Understanding the granular problems and challenges of intent identification
Different approaches to solve the problem
Exposure to available public datasets for Indic languages and its utility

Identifying Security Vulnerabilities in Software using Deep Transfer Learning for NLP by Dipanjan Sarkar

I feel security is a topic not often spoken about in this space. When was the last time you heard deep learning being used for preventing adversarial attacks?

Vulnerabilities are quite common in software systems and can potentially cause a plethora of problems including deadlock, information loss, or system failure. The challenge lies in sufficiently capturing both semantic and syntactic representations of source code for building accurate prediction models.

Dipanjan Sarkar, one of the most popular speakers in the data science community, will help you leverage state-of-the-art learning models in NLP through GitHub events data to predict probable vulnerabilities with decent precision/recall rates based on data tested till 2018.

He will also share some unique use cases where deep transfer learning can be applied on text data and cover some of the interesting models including stacked bi-directional GRUs, pre-trained embeddings and leveraging transformer models like BERT.

Deep Learning for Search in E-Commerce by Sonu Sharma and Atul Agarwal

Machine learning is used in almost every part of the system at major search engines like Google, Bing, etc. However, most e-commerce websites are powered by search engines which provide excellent ROI and help in retaining and finally converting the user for a sale.

And the fact remains – improving search results offers a huge return on investment for retailers. It’s a topic I feel everyone should at least be aware of in their organization.

In this hack session at DataHack Summit 2019, Atul Agarwal and Sonu Sharma, software engineers at Walmart Labs, will come together and share insights on how to use NLP based Deep Learning models to effectively design search platforms with a focus on the e-commerce use case.

Key Takeaways from this Hack Session

Learn about Natural Language Processing techniques like word-embeddings – BERT / ELMo, Bi-LSTM Networks, etc.
Application of Named Entity Recognition (NER) and seq2seq modeling in the search domain
Get to know about various Multi-Class/Multi-Label Classification Problems related to Search Domains

BERT is the hottest trending NLP framework and you can get a complete understanding of it in this article:

Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework

Interpreting State-of-the-Art NLP Models

Building a complex and dense machine learning model has the potential of reaching our desired accuracy, but does it make sense? Can you open up the black-box and explain how the final results are derived?

This question is at the heart of machine learning – the ability to interpret and explain your model’s performance result is a critical requirement for stakeholders and clients.

Recent progress in NLP, with the advent of Attention-based models, has made it easier for us to interpret and understand the decisions of the model. Here is a fantastic hack session to learn to interpret sequence-to-sequence models with our amazing speaker Logesh Kumar Umapath.

His hack session would include techniques that can be used to interpret the decisions of Recurrent Neural Networks (RNNs), long short-term memory (LSTM) and Transformer models. It would also include model agnostic techniques for interpretation.

Key Takeaways from this Hack Session

Learn how to leverage attention models and layers for interpretation
Learn model-agnostic techniques for interpretation of NLP models

Here are a few awesome articles to get up to date with the topics being covered in this hack session:

Synthetic Text Data Generation using RNN based Deep Learning Models by Raghav Bali

Handwritten text recognition requires a large number of labeled samples, which are really costly to produce. Yes, there is that cost factor again. How cool would it be if we could build a handwritten text generation model without having to splurge out of our pockets?

Well, you don’t have to wait long to find the answer!

Raghav Bali, a Senior Data Scientist at UnitedHealth Group, will facilitate a hands-on code walkthrough which will enable you to prepare a simple deep learning model to generate text using Recurrent Neural Networks (RNNs).

You will also gain insights on use cases where deep learning techniques are being utilized to generate data using some interesting architectures. Here’s a quick summary of what Raghav will be covering:

A quick overview of RNNs and different DL architectures for such a use case
A brief introduction to some interesting research into this domain
Hands-on code walkthrough to prepare a simple DL model to generate text
Model fine-tuning and results

Key takeaways from this Hack Session

Understand the use of DL models to generate data (this talk is not about GANs!)
Build a DL model to generate synthetic handwritten text to solve real-world problems

You can take this quick tutorial on building a Recurrent Neural Network from Scratch in Python:

Build a Recurrent Neural Network from Scratch in Python – An Essential Read for Data Scientists

Automatic Subtitle Generation using NLP and Deep Learning by Prateek Joshi and Mohd Sanad Zaki Rizvi

Ever watched videos on YouTube or movies on Netflix and wondered how they generated such accurate subtitles? Manually doing that is a thankless and impossible job. Imagine the scale at which these platforms operate – they need a machine learning-powered solution.

That’s exactly what we will learn through this hack session by Analytics Vidhya’s two outstanding data scientists – Prateek Joshi and Mohd Sanad Zaki Rizvi.

In this exciting hack session, they will combine NLP and audio processing to automatically generate subtitles from a video. Here is the structure of the session they’re planning:

About Speech-to-Text Conversion
- History
- Use Cases
- Challenges
Dataset and Approaches
Pretrained model vs. Model built from scratch
Python notebook walkthrough

Key Takeaways from this Hack Session

Working with Sequence Data
Processing Raw Audio
Converting Audio to Text

If you are interested in learning how to build your own speech to text model then here is a fantastic guide for you:

Learn how to Build your own Speech-to-Text Model (using Python)

End Notes

So are you ready to broaden your horizons and expand your skillset? This is the best time to get involved in the world of NLP – the hottest space in the data science space right now.

So why wait! DataHack Summit 2019 seats are filling fast and we have just a few tickets left so:

Reserve Your Seat TODAY!

It will be great to network with you at DataHack Summit 2019 – see you soon!

Sneha Jain

The heart of every marketing campaign is great content and I love churning just that! I am a Data Science content marketing enthusiast. Exploring the field of applied Artificial Intelligence and Machine Learning and consistently being involved in editing the content at Analytics Vidhya is how I spend my day. I have always been fueled by the passion to do something different. The core of me is always eager to explore and learn more and more each day not only in the field of Data Science but also in the field of Psychology.

Free Courses

4.5

Building a Deep Research AI Agent

Build a Research & Report Agent with LangGraph & OpenAI for under $1!

4.6

Introduction to Transformers and Attention Mechanisms

Learn attention mechanisms, RNNs, Seq2Seq, BERT & NLP applications.

4.6

Getting Started with Large Language Models

Embark on an LLM journey: Master NLP and model training

4.7

Nano Course: Building Large Language Models for Code

Train Code LLMs from scratch: curate data, evaluate & build Starcoder (15B)

4.6

DeepSeek from Scratch; Architectural Components

DeepSeek from Scratch: Learn input, self-attention, RoPE & more.

Reading list

7 Amazing NLP Hack Sessions to Watch out for at DataHack Summit 2019

Reserve Your Seat TODAY!

Here’s the List of Power-Packed NLP Hack Sessions at DHS 2019

Comparison of Transfer Learning Models in NLP by Sudalai Rajkumar (SRK)

Key Takeaways from this Hack Session

Intent Identification for Indic Languages by Krupal Modi

Key Takeaways from this Hack Session

Identifying Security Vulnerabilities in Software using Deep Transfer Learning for NLP by Dipanjan Sarkar

Deep Learning for Search in E-Commerce by Sonu Sharma and Atul Agarwal

Key Takeaways from this Hack Session

Interpreting State-of-the-Art NLP Models

Key Takeaways from this Hack Session

Synthetic Text Data Generation using RNN based Deep Learning Models by Raghav Bali

Key takeaways from this Hack Session

Automatic Subtitle Generation using NLP and Deep Learning by Prateek Joshi and Mohd Sanad Zaki Rizvi

Key Takeaways from this Hack Session

End Notes

Reserve Your Seat TODAY!

Login to continue reading and enjoy expert-curated content.

Free Courses

Building a Deep Research AI Agent

Introduction to Transformers and Attention Mechanisms

Getting Started with Large Language Models

Nano Course: Building Large Language Models for Code

DeepSeek from Scratch; Architectural Components

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

7 Amazing NLP Hack Sessions to Watch out for at DataHack Summit 2019

Reserve Your Seat TODAY!

Here’s the List of Power-Packed NLP Hack Sessions at DHS 2019

Comparison of Transfer Learning Models in NLP by Sudalai Rajkumar (SRK)

Key Takeaways from this Hack Session

Intent Identification for Indic Languages by Krupal Modi

Key Takeaways from this Hack Session

Identifying Security Vulnerabilities in Software using Deep Transfer Learning for NLP by Dipanjan Sarkar

Deep Learning for Search in E-Commerce by Sonu Sharma and Atul Agarwal

Key Takeaways from this Hack Session

Interpreting State-of-the-Art NLP Models

Key Takeaways from this Hack Session

Synthetic Text Data Generation using RNN based Deep Learning Models by Raghav Bali

Key takeaways from this Hack Session

Automatic Subtitle Generation using NLP and Deep Learning by Prateek Joshi and Mohd Sanad Zaki Rizvi

Key Takeaways from this Hack Session

End Notes

Reserve Your Seat TODAY!

Login to continue reading and enjoy expert-curated content.

Free Courses

Building a Deep Research AI Agent

Introduction to Transformers and Attention Mechanisms

Getting Started with Large Language Models

Nano Course: Building Large Language Models for Code

DeepSeek from Scratch; Architectural Components

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques