Understanding Natural Language Inferencing

Alifia Ghantiwala Last Updated : 11 Apr, 2022

5 min read

This article was published as a part of the Data Science Blogathon.

Introduction

Natural Language Inferencing is a subset of Natural language processing. Inferencing means the machine is trying to understand if an assumption is a contradiction of the premise or it is an entailment or is neutral.

Let me give you an example:

Consider, the premise to be: “Boys played well” and the assumption is “Indian women’s team wins laurels internationally“. Here the assumption is a contradiction of the premise. Another example would be the premise is “Cyclone Asani can cause havoc in the bay of Bengal” and the assumption is “Evacuation is started in India by the government“. Here, the assumption would be an entailment of the premise, that is it follows the premise. I hope you find the examples useful. We would be working on a multilingual Kaggle dataset Contradictory, my dear Watson! We would begin by understanding the data.

Data Exploration

We check the balance of data, to begin with.

sns.countplot(data=train,x="label")

plt.show()

The data seems to be balanced.

Next, we check the samples for every language

chart = sns.countplot(data=train,x="language")
chart.set_xticklabels(chart.get_xticklabels(), rotation=45)
plt.show()

As is clear, out of 15 languages most of our data is from the English language almost 6870 samples whereas the other languages have around 400 samples.

More than 50% of the data is in English.

We would need to account for the same while model training.

We draw some word clouds in different languages.

plot_wordcloud(train,”German”)

Data Exploration Image 1 | Natural Language Inferencing

plot_wordcloud(train,"Russian")

Data Exploration Image 2 | Natural Language Inferencing

plot_wordcloud(train,"English")

plot_wordcloud(train,"Vietnamese")

Baseline Model

As a baseline, I took the following steps:

1) To translate all the data to one language (English)

2) Use TF IDF for vectorization of text data

3) Use the Random Forest model with Grid Search CV for hyperparameter optimization.

For translation to English, you can use google translate API

!pip install googletrans==3.1.0a0

def Translate(x):
    translator = Translator()
    translator.raise_Exception = True 
    return str(translator.translate(x,dest="en").text)

Using the above code we were able to complete the first step of our outline.

We then used TF IDF, code for the same.

from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(ngram_range=(1,3),min_df=15,max_features=500,stop_words='english')

train_premise = vect.fit_transform(trans_train["premise"])
test_premise = vect.transform(trans_test["premise"])
train_hypothesis = vect.fit_transform(trans_train["hypothesis"])
test_hypothesis = vect.transform(trans_test["hypothesis"])
train_lang_abv = vect.fit_transform(trans_train["lang_abv"])
test_lang_abv = vect.transform(trans_test["lang_abv"])

from scipy.sparse import hstack
X = hstack([train_premise,train_hypothesis,train_lang_abv])
X_test = hstack([test_premise,test_hypothesis,test_lang_abv])

Code for random forest and Grid Search CV

# Define model
model=XGBClassifier(random_state=0,use_label_encoder=False)

# Parameters grid
param_grid = {'n_estimators': [50, 150, 200],
        'max_depth': [4, 6, 8, 10, 12],
        'learning_rate': [0.025, 0.05, 0.075, 0.1, 0.125, 0.15],
             'eval_metric':['logloss']}

# Cross validation
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)

# Grid Search
grid_model = GridSearchCV(model,param_grid,cv=kf)

# Train classifier with optimal parameters
grid_model.fit(X,y)

With this model, I was able to receive an accuracy of 38.209% which had a lot of room to improve.

After researching about best ways to work on NLI problems I came across the concept of transformers and transfer learning.

Using XML Roberta I was able to increase the accuracy of my model to 92%.

Using XML-Roberta Model

Transformers:

Transformers are deep neural network models which use attention mechanisms. Recurrent neural networks were not good with long-term dependencies. As natural language processing has long-term dependencies in them, transformers were introduced to solve these problems.

Transformers can be broadly classified into two components Encoder and Decoder.

The Encoder is a bidirectional LSTM, it is bidirectional to provide better context for the words in the sentence.

The Decoder is a unidirectional LSTM simply because words are to be generated one at a time sequentially.

Attention helps to focus on words that are in the vicinity of the target word by providing corresponding weights.

If you are new to the topic of transformers I would recommend the following readings.

1) https://jalammar.github.io/illustrated-transformer/

Transfer Learning:

Transfer learning was already a breakthrough concept in Computer vision, it was first used in the natural language processing domain with transformers themselves.

We do not have high computational power or large amounts of data at our disposal most of the time. Using a model which is already trained on Gigabytes of data, and fine-tuning it to our task, can save us a lot of time and help us with amazing results.

Roberta is an improvement over the BERT model, which is why we would have a brief discussion on the BERT model.

BERT:

BERT is short for Bidirectional Encoder Representations from Transformers.

It uses a stack of encoders and decoders in addition to using attention in working with NLP tasks like text classification and natural language inferencing.

It was trained on the whole of Wikipedia available in English. It is pre-trained on two tasks

1) Masked Language Model

15% of tokens are masked when provided as input to the BERT model. The model has to then predict the most likely token for the masked tokens, which in turn, helps it to learn the context of the words.

2) Next Sentence Prediction

It involves predicting if the next sentence would be good if it follows the given first sentence. This again helps BERT to generate contextual embeddings for words.

How is it different from other pre-trained embeddings like Glove or Word2Vec?

Word2Vec or Glove would give the same embedding for a word, irrespective of its context, which we know is it not the case with BERT.

Roberta:

Roberta stands for Robustly Optimised BERT Pretraining Approach.

It differs from BERT in the amount of data it has been pre-trained on. BERT was trained on 16GB of data, and Roberta is trained on 160GB of data. It uses dynamic masking instead of static masking used by BERT.

Dynamic masking is using different masks for your data. It performs slightly better than static masking.

We use a fine-tuned model of Roberta for NLI which is xlm-roberta-large-xnli for solving our problem at hand

Importing the model and tokenizer:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('joeddav/xlm-roberta-large-xnli')
tokenizer = AutoTokenizer.from_pretrained('joeddav/xlm-roberta-large-xnli')

Use the pre-trained model for predictions, without any fine-tuning

def get_tokens_xlmr_model(data):
        batch_tokens = []    
        for i in range(len(data)):         
            tokens = tokenizer.encode(data["premise"][i], data["hypothesis"][i], return_tensors="pt", truncation_strategy="only_first")         
            batch_tokens.append(tokens)     
        return batch_tokens     
def get_predicts_xlmr_model(tokens): 
        batch_predicts = []     
        for i in tokens:         
            predict = nli_model(i)[0][0]         
            predict = int(predict.argmax())         
            batch_predicts.append(predict)     
        return batch_predicts

sample_train_data_tokens = get_tokens_xlmr_model(train)   
sample_train_data_predictions = get_predicts_xlmr_model(sample_train_data_tokens)

The Roberta model helped us achieve an accuracy of 92.3% from the original 38% of our baseline model.

Conclusion

The article provides an introduction to NLI with python code for both baseline models and using transfer learning by using a pre-trained model.
Further improvement is possible on the score by understanding the errors made by the pre-trained model and fine-tuning it further.

Hope you liked my article on Natural Language Inferencing? Please share your views in the comments below.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Alifia Ghantiwala

I investigate data on a daily basis to find insights! I write so that I can understand more clearly. Have completed my graduation in Computer Engineering, and won my first public data science competition last month, March 2021, hosted on Kaggle by Google Developers.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Understanding Natural Language Inferencing

Introduction

Data Exploration

Baseline Model

Using XML-Roberta Model

Transformers:

Transfer Learning:

BERT:

Roberta:

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics