Dialogue Summarization: A Deep Learning Approach

Aseem Last Updated : 18 Feb, 2021

5 min read

This article was published as a part of the Data Science Blogathon.

dialogue summarization

Dialogue Summarization: Its types and methodology

Image cc: Aseem Srivastava

Summarizing long pieces of text is a challenging problem. Summarization is done primarily in two ways: extractive approach and abstractive approach. In this work, we break down the problem of meeting summarization into extractive and abstractive components which further collectively generate a summary of the conversation.

What is Dialogue Summarization?

Humans are social animals, we exchange ideas, share information, and make plans with each other. Text and Speech are the two common conversation mediums, but mostly it’s speech. With the abundance of digital conversation happening over online messaging, IRC, meeting platforms, and the ubiquity of automatic speech recognition systems come vast amounts of meeting transcripts.

Therefore, the need to succinctly summarize the content of the conversation naturally arises. Several methods of generating summaries have been proposed. A very standard and crucial application of conversational dialogue summarization is in meeting summary generation.

Traditional methods used different extractive summarization approaches which were good to extract only important phrases within the document. Some new techniques came up with transformer-based models that were capable of generating coherent and subjective summaries.

Seeing this availability of research and lack of capabilities of the current model, we present a novel hierarchical model, an extractive to abstractive summarization approach that does extractive then abstractive summarization and gave the highest ROUGE scores.

Our best performing extractor is a two-step hierarchical model that encodes each sentence with a pre-trained BERT model and then applies a bidirectional LSTM to create each utterance’s sentence embedding and later passed to an unsupervised clustering mechanism to detect key sentences. For the abstractive module, our best-performing model is a fine-tuned PEGASUS Model to generate an abstractive summary.

In summary, our contributions are as follows:

We present a new novel approach to work on long summaries
We beat the state-of-the-art (SOTA) results on the AMI meeting dataset.

Let’s Dive into the Methodology

The methodology used is a two-step hierarchical extractive to abstractive summary generation methodology in which transformer-based architecture, PEGASUS as in cite{zhang2020pegasus} is used where the base architecture of the model is a standard transformer encoder-decoder with a novel pre-training technique where whole sentences are masked with [MASK] tokens along with few other tokens randomly.

The input to the encoder-decoder model is given in 512 length embedding which is generated after processing the AMI input from a BERT-based extractive summarizer where we apply k-means on a BERT sentence-level embedding. We use two variants of this approach, with and without fine-tuning.

Our baseline model is also a transformer model where pre-trained encoder BERT is used to produce extractive summaries and further these extractive summaries are passed with a GPT2 decoder model to construct abstractive and meaningful summaries. (see figure below).

Extractive Summarization Approach

Extractive summary generation is itself performed in two steps where input text is converted into the BERT sentence embedding and is further passed through an unsupervised algorithm cite{Jin2010} to cluster the most important sentences. These clusters are basically a collection of sentences from the input text and represent the most relevant sentences from the input.

The unsupervised extractive summary generation technique has been attempted previously and it was shown how clustering techniques can help to select key components of the text. Since this extractive technique is followed in an unsupervised fashion, it becomes worth mentioning that the need for parallel annotated data suddenly gets eliminated and could be trained on large corpora. As shown in the figure. 1, the left section shows the extractive summary generator and which is passed further to the abstractive summary generator on the right.

Abstractive Summarization Approach

The abstractive summarizer is an encoder-decoder-based language model, PEGASUS to generate a semantically good summary. PEGASUS is a pre-training technique introducing gap sentences masking and summary generation.

Typically the architecture of the PEGASUS model contains 15 layers of encoder and 15 layers of a decoder which collectively consider text documents as input after masking. Their hypothesis states that the closer the pre-training self-supervised objective is to the final down-stream task, the better the fine-tuning performance. In the discussed approach, pre-training, several sentences are removed from documents and the model is tasked with recovering them. Example input for pre-training is a document with missing sentences, while the output consists of the missing sentences concatenated together.

This is an incredibly difficult task that may seem impossible, even for people, and we don’t expect the model to solve it perfectly. However, It is a problematic task that encourages the model to get to know about language as well as facts about the world, as well as how to filter out information considered throughout the document to produce output that marginally relates to the fine-tuning our task. The pros of this self-supervision are that we could generate as many similar examples as there are documents, without any annotation, which is often the narrow section in supervised systems.

Results

About the Metric: ROUGE 1 and ROUGE 2

These metrics are respectively based on unigram, bigram, and unigram overlap with a maximum skip distance of 4, and be highly correlated with human evaluations. ROUGE-2 scores can be seen as a measure of summary readability. Also, there is one opposite way to evaluate the performance of the approach and that is the human evaluation metric.

But it is slow and very expensive. Although, empirical studies show that the model’s performance judgment can be accurately done using ROGUE metric approaches in both automatic metrics and human evaluation. Table 1 gives a summary of all the experiments performed on the summarization task on the AMI dataset. The results clearly show that the results were improved by a considerable margin.

Results(table)

So in the end, let’s see if this work was new or something found on the first to last page of Google search!

So far the work on meeting summarization is not done on a large scale and there is a lot of scope for improvement in this research area. Since most of the summarization tasks are performed over document or new article data format. To make good use of the previous state-of-the-art models, we converted the conversation from dialogue format to article format to generate summaries of conversation in the news article summary generation methodology.

Our key approach extractive to abstractive was fundamentally the novel idea of combining two different steps to generate the novel idea. Since it is hard to train transformers on longer sentences. This two-step summarization technique involves the state of the art implementation of the PEGASUS model which is finetuned on our data.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Aseem

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Dialogue Summarization: A Deep Learning Approach

What is Dialogue Summarization?

Let’s Dive into the Methodology

Extractive Summarization Approach

Abstractive Summarization Approach

Results

About the Metric: ROUGE 1 and ROUGE 2

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at