A beginner’s guide to understanding Text Summarization with NLP

Alifia Ghantiwala Last Updated : 25 Nov, 2024
8 min read

Consider a scenario where you don’t have to read an entire article or research paper. Instead, you could read just the most important statements. This is made possible through text summarization, a widely used technique in NLP. Text summarization takes a sequence of words as input (the article) and returns a summary as output. This makes it an essential application of sequence-to-sequence models in NLP machine learning. It is highly useful in domains like financial research, question-answer bots, media monitoring, and social media marketing. In this article, we would cover text summarization in detail, including its techniques and applications in NLP and text analytics.

This article was published as a part of the Data Science Blogathon.

Types of neural text summarization

In school, most of us had to understand and convert long text articles into their succinct summaries, the technique we used then was to grasp the underlying idea of the text and reproduce the summary that would cover all the important points. This is similar to the idea of abstractive text summarization, wherein the machine learning model would output the main idea of the input text using similar words but not exact sentences from the input.

The second type of summarization is extractive summarization, where the model output represents a subset of the input text that conveys the main idea of the input article. This approach makes extractive summarization an important application of text summarization in NLP. A personal analogy that I would like to share is, you can consider extractive summarization as highlighting important points of a reference paper that you are trying to understand. Extractive summarization represents a commonly used approach in text summarization NLP techniques, aiming to extract and present the most relevant information from the original text while preserving its meaning.

As you may have guessed, extractive summarization is simpler to model than abstractive summarization because in abstractive summarization, the model must understand language and its nuances to derive meaning and produce a valid summary. Whereas in extractive summarization using some form of scoring (which we would discuss in detail later in this article), the model has to threshold and output the most important sentences of the input itself.

Naturally, there is more research available for extractive summarization than abstractive summarization. In this article, we would look into extractive summarization in further detail.

Using a pre-trained summarizer and evaluating its output

What do we mean by pre-trained models? These models have already undergone training on large datasets. When a model trains on huge amounts of data, it naturally predicts better. However, the inability to collect large amounts of data and the resulting higher training time are some reasons why we can benefit from using a pre-trained model instead of training one from scratch.

We would be using the BBC News Summary dataset for this article and bert-extractive-summarizer as the pre-trained model.

Below code, snippet includes loading the necessary libraries

!pip install bert-extractive-summarizer
!pip install spacy
!pip install transformers # > 4.0.0
!pip install neuralcoref
!python -m spacy download en_core_web_md

After importing the above libraries and downloading the spacy model we would now call the summarizer and pass a sample text to view its output.

#from summarizer import Summarizer

model = Summarizer()

text = "Learning NLP involves understanding basic principles of machine learning which then need to be customized for words. With the advent of using transfer learning for NLP I think it hads made a huge progress in terms of its research"

As you can see in the below output the model does provide an appropriate summary given our input text.

Now let us use the same model on our BBC news dataset, the below snippet takes care of the same. As we have a total of 2225 input articles with an average length of 3000 words, to save execution time I have predicted the summary items only for the first 10 input articles.

from tqdm import tqdm
bert_predicted_summary = []
k = 0
for i in tqdm(df['text']):
    if k < 10:
        x = model(str(i))
        bert_predicted_summary.append(x)
        k+=1

Below is the attached output, the first one is what the pre-trained model predicted and the second one is the actual summary provided in the dataset.

Text summarization model

Using simple preprocessing techniques like removing newline characters(n) or end of sentence characters(b) is always recommended. As the popular saying goes garbage in is garbage out, so we need to clean our input before passing it to our model. I have used simple regular expressions for preprocessing the input, the code snippet for the same is as below.

path = '/kaggle/input/bbc-news-summary/bbc news summary/BBC News Summary/News Articles/'
for i in os.listdir(path):
    for j in os.listdir(os.path.join(path+i)):
        with open(os.path.join(path+i+'/'+j),'rb') as f:
                article = f.readlines()
                article = re.sub('b'','',str(article))
                article = re.sub('[\nnt-\/]','',article)
                article = re.sub('n'','',article)
                article = re.sub('xc2xa','',article)
                article = article.lower()
                text.append(article)
                type_.append(i)

For evaluating the output the metric we use is the BLEU score, in the next section of the article, we would go through the same in detail.

Understanding BLEU score and its calculation:-

BLEU score stands for Bilingual Evaluation Understudy, it is a metric widely used for machine translation, text generation, and for models having a word sequence as output. Let us understand how it is calculated.

The range of BLEU scores is between 0 and 1, where 0 signifies no match between the expected output and the predicted output and 1 means a perfect match. BLEU can be considered as a modification to precision to handle sequence outputs.

Consider an example. Suppose our predicted summary (or candidate) is awesome awesome awesome. The actual or expected summary (also known as the reference) is NLP is awesome. All the words in the predicted output are present in the reference, giving it a precision of 1. However, we can all agree that this is a poor-quality summary.

To overcome this, BLEU performs a simple modification: it clips the number of times a word appears in the candidate or predicted output to the maximum number of times it appears in the reference or expected output. So in the case of our example, the score now becomes 1/3 as awesome is present only once in the reference.

Taking another example, let’s say our reference is “I want to learn NLP”, and our candidate is “NLP is what I want to learn” if we consider only unigrams BLEU score would be perfect, i.e. 1. But so would be the BLEU score for  “NLP learn I want to”, which is not correct grammatically.

This is why BLEU also considers n-grams(bigrams, trigrams 4-grams).  If we account for bigrams in the same example, then bigrams that are possible from our candidate are “NLP is”, “is what”, “what I”, “I want”, “want to”, “to learn”. and the bigram precision score now becomes 3/4. This explains that BLEU rewards exact matching sequences of words between candidate and reference.

BLEU also penalizes sentences shorter than the reference sentence. To understand why it does this, we can extend the original example. Now consider our candidate to be “NLP is.” If we consider bigrams, this candidate would receive a BLEU score of 1. BLEU then penalizes the score by multiplying it with a penalty calculated by dividing the length of the reference sentence by the length of our output, subtracting one from that, and raising it to the power of e. In our case, the penalty would be 0.36 making our BLEU score 0.36 from 1.

We can all now agree why BLEU is a widely used metric but it does have some flaws like it does not consider meaning. You can further read about problems with BLEU to gain a better understanding of the metric on this Blog.

We now look at the below BLEU scores for our generated summaries through the pre-trained BERT model

Code snippet for the calculation

def calculate_bleu_score(bert_predicted_summary,df):
    for i in range(len(bert_predicted_summary)):
        candidate = list(bert_predicted_summary[i].split("."))
        reference = list(str(df['summary'][i]).split("."))
        print(corpus_bleu(reference[:len(candidate)],candidate))
calculate_bleu_score(bert_predicted_summary,df)

Output

Text summarization - bleu score

We can see that with basic preprocessing and without fine-tuning the pre-trained model, for the first 10 predicted summaries we receive a good score for each of the summaries with an average of 0.6 BLEU score.

Now let us dig further deep and create our own text summarizer using python.

Coding a text summarization model in python from scratch

Why do we need to build an extractive summarizer from scratch when we already have amazing pre-trained models available?

To help build intuition and not consider it simply as a black box that gives us our desired output. With that said, now let us dig further deep and create our own text summarizer using python. As we had discussed earlier extractive summarizer needs to score sentences and return the most important sentences as the summary. There are many scoring functions possible, let us consider the below.

We assign a score to each word based on its frequency in the entire corpus. Be sure to remove stop words, as they can skew frequency counts. Next, score the sentences in each input article by summing up the frequencies of their constituent words.

Implementation in python is as below:

def count_freq():

    res = {}
    for i in df['cleaned_text']:
        for k in word_tokenize(i):
            if k in res:
                res[k] += 1
            else:
                res[k] = 1
    return res
word_freq = count_freq()

In the above function, we create a dictionary word_freq which includes word count for every word present in the corpus.

def sentence_rank(text):
    weights = []
    sentences = sent_tokenize(text)
    for sentence in sentences:
        temp = 0
        words = word_tokenize(sentence)
        for word in words:
            temp += word_freq[word]
        weights.append(temp)
    return weights

As part of the sentence_rank function, we provide weight to the sentences which would be the sum of word counts of all words present in the sentence.

n = 14
for i in range(10):
    ranked_sentences = sentence_rank(df['cleaned_text'][i])
    sentences = sent_tokenize(df['cleaned_text'][i])
    sort_list = np.argsort(ranked_sentences)[::-1][:n]
    result = ''
    for i in range(n):
        result += '{} '.format(sentences[sort_list[i]])
    candidate = result
    reference = df['summary'][i]
    print(corpus_bleu(reference[:len(candidate)],candidate[:len(reference)]))

In the above code snippet, we are just making use of the sentence_rank function we discussed above, to summarize each of the input articles and calculate the bleu scores. n is a hyperparameter that controls the length of the generated summary, after iterating over some values I have chosen a length of 14 as it was giving me a good BLEU score. As you can see below with our very basic text summarizer we are able to achieve on average a BLEU score of 0.5 which is 0.1 lesser than what we achieved with the pre-trained model on the same input.

For improving the text summarizer, we could use

1) TF-IDF scores instead of just using word frequencies

2) Sequence to Sequence Encoder-Decoder models and so on

While there is definitely scope for improvement for our text summarizer, I would end this article here. If you have any suggestions regarding the improvement of the article, feel free to comment below.

Conclusion

In summary, this guide introduced you to making short text summaries using NLP. We covered different types, using ready-made tools, and how to measure success with BLEU score. Plus, you’ve got a taste of creating your own summarizer in Python. Now, you’re all set to summarize text like a pro!

I am Alifia, currently working as an analyst. By writing these articles I try to deepen my understanding of applied machine learning.  

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion

Frequently Asked Questions

Q1.How BERT is Used for Text Summarization ?

BERT serves as a smart tool for summarizing text. It learns from lots of examples and then fine-tunes itself to create short and clear summaries. This helps in making quick and efficient summaries of long pieces of writing.

Q2. What is the objective of text summarization?

The goal of text summarization is to make things shorter while keeping the important stuff. It’s like making a quick version that highlights the main ideas, making it easier and faster for people to understand.

I investigate data on a daily basis to find insights! I write so that I can understand more clearly. Have completed my graduation in Computer Engineering, and won my first public data science competition last month, March 2021, hosted on Kaggle by Google Developers.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details