In-Depth Explanation Of Recurrent Neural Network

Ashray Last Updated : 20 Jul, 2021

5 min read

This article was published as a part of the Data Science Blogathon

Table Of Content

Introduction
Architecture Of Recurrent Neural Network
Application Of Recurrent Neural Network

Introduction

Recurrent Neural Networks (RNN) are a part of the neural network’s family used for processing sequential data. For example, consider the following equation:

h^t = f(h^t-1; x) e.q 1

Recurrent Neural Network image — **Figure 1:** A recurrent neural network with no output which represents the equation
1. This network takes x as input and incorporates it into
state h which is also known as a hidden state that
is passed forward. The black square indicates a delay in a single time step.

The above equation is recurrent because the definition of h at time t refers to the same definition at time t-1. If we want to find the value of h at 3rd time step, we have to unfold equation1 i.e.

h³ = f( h²; x) = f(f( h¹; x); x) e.q 2

Now the question that arises here is that we already have Feedforward Neural Network(ANN), then why should we use a Recurrent Neural Network. Let’s understand this with an example:

Consider the two sentences
“I went to India in 2017” and “In 2017, I went to India”.

Now, if we ask the model to extract the information on where did the person was in 2017, we would like it to recognize the year 2017, whether it appears in the second or the sixth position of the sentence.

Suppose we give these two sentences to the Feedforward Neural Network, as we know that it has different learning weights for each layer the model will try to learn all of the rules of languages separately at each position in the sentence even though the meaning of both sentences is same it will treat them differently. It can become a problem when there are many such sentences
with the same logical meaning and will always negatively affect the model accuracy.

NOTE: Recurrent Neural Network shares the same learning weight across each time step which is an important property of RNN and thus did not suffer from the above problem.

Architecture Of Recurrent Neural Network

Figure 2: Architecture of recurrent neural network where x, h, o, L, y represents input, hidden state, output, loss, and target value respectively.

Recurrent Neural Network maps an input sequence x values to a corresponding sequence of output o values. A loss L measure the difference between the actual output y and the predicted output o. The RNN has also input to hidden connection parametrized by a weight matrix U, hidden to hidden connections parametrized by a weight matrix W, and hidden-to-output connections parametrized by a weight matrix V. Then from time step t = 1 to t = n we apply the following equation:

working of rnn — **Figure 3:** These are the forward propagation equations of the recurrent neural network where **U, V, W** are the weight matrix that is shared among each time step.

The above equations are also known as **forwarding propagation** of RNN where the b and c are the bias vectors and **tanh** and **softmax** are the activation functions. To update the weight matrix **U, V, W** we calculate the gradient of the loss function for each weight matrix i.e. **∂L/∂U**, **∂L/∂V**, **∂L/∂W,** and update each weight matrix with the help of a back-propagation algorithm. When a back-propagation algorithm is applied to RNN, it is sometimes also known as **BPTT** i.e. **backpropagation through time**. Gradient calculation requires a forward propagation and backward propagation of the network which implies that the runtime of both propagations is **O(n)** i.e. the length of the input. The Runtime of the algorithm cannot reduce further because the design of the network is inherently sequential.

Depending on the objective we can choose any loss function. Total loss for a given sequence of x values is the sum of all the losses at an individual time step.

Another variation that can be done in recurrent neural network architecture is that we can change the recurrent connection from hidden to hidden state and make it from output to hidden state.

Variation of RNN — **Figure 3:** These are the forward propagation equations of the recurrent neural network where **U, V, W** are the weight matrix that is shared among each time step.

The above equations are also known as **forwarding propagation** of RNN where the b and c are the bias vectors and **tanh** and **softmax** are the activation functions. To update the weight matrix **U, V, W** we calculate the gradient of the loss function for each weight matrix i.e. **∂L/∂U**, **∂L/∂V**, **∂L/∂W,** and update each weight matrix with the help of a back-propagation algorithm. When a back-propagation algorithm is applied to RNN, it is sometimes also known as **BPTT** i.e. **backpropagation through time**. Gradient calculation requires a forward propagation and backward propagation of the network which implies that the runtime of both propagations is **O(n)** i.e. the length of the input. The Runtime of the algorithm cannot reduce further because the design of the network is inherently sequential.

Depending on the objective we can choose any loss function. Total loss for a given sequence of x values is the sum of all the losses at an individual time step.

Another variation that can be done in recurrent neural network architecture is that we can change the recurrent connection from hidden to hidden state and make it from output to hidden state.

NOTE : Such types of recurrent neural networks are less powerful and can express a smaller set of functions this is because of the connection that we have made. Recurrent neural networks which are represented by Figure 2 are universal in the sense that any function computable by a Turing machine can be computed by such a recurrent network of finite size.

Application of Recurrent Neural Network

RNNs are used in a wide range of problems :

Text Summarization

Text summarization is a process of creating a subset that represents the most important and relevant information of the original content. For example, text summarization can be useful for someone who wants to read the summary instead of the whole content. It will save time if the original content was not useful for the reader.

Language Translation

Almost every language translation machine uses RNN in its backend. They are used to convert text from one language to other. Input will be the source language and output will be the language that users want. The most popular example of language translation is Google Translator.

Language Modelling And Generating Text

Language modelling is the task of assigning a probability to sentences in a language. Besides assigning a probability to every sequence of words, the language models also assign a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words. For example, nowadays every messenger provides such a facility that tries to autocomplete a sentence and show suggestions while we are typing.

Chatbots

A chatbot is a computer program that simulates and processes human conversation. Chatbots are often simple as rudimentary programs that answer an easy query with a single-line response or as complex as digital assistants that learn and evolve from their surroundings and gather and process information. For example, most online customer services have a chatbot that responds to queries in a question-answer format.

Generating Image Descriptions

A Combination of
Convolutional Neural Network and Recurrent Neural Network can be used to create
a model that generates natural language descriptions of images and their
regions. The model will describe what exactly is happening inside an image.

End Notes:

The source of the images has been taken from Deep Learning by Lan Goodfellow, Yoshua Bengio, and Aaron Courville.

I hope you enjoyed reading the article. If you found it useful, please share it among your friends and on social media. For any queries, suggestions, or any other discussion, please ping me here in the comments or contact me via Email or LinkedIn.

Contact me on LinkedIn – www.linkedin.com/in/ashray-saini-2313b2162

Contact me on Email – [email protected]

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Ashray

Hey, I'm Ashray Saini. I'm am currently pursuing M.Tech in Artificial Intelligence from NIT Uttarakhand and did my B.Tech in CSE from Graphic Era University. My current research interests include Deep Learning, Machine Learning, Natural Language Processing, and computer vision.

Deep Learning Intermediate

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

In-Depth Explanation Of Recurrent Neural Network

Table Of Content

Introduction

Architecture Of Recurrent Neural Network

Application of Recurrent Neural Network

Text Summarization

Language Translation

Language Modelling And Generating Text

Chatbots

Generating Image Descriptions

End Notes:

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory