Natural Language Processing for Beginners: Using TextBlob

shubham.jain Last Updated : 24 Oct, 2024

8 min read

Introduction

Natural Language Processing (NLP) is an area of growing attention due to increasing number of applications like chatbots, machine translation etc. In some ways, the entire revolution of intelligent machines in based on the ability to understand and interact with humans.

I have been exploring NLP for some time now. My journey started with NLTK library in Python, which was the recommended library to get started at that time. NLTK is a perfect library for education and research, it becomes very heavy and tedious for completing even the simple tasks.

Later, I got introduced to TextBlob, which is built on the shoulders of NLTK and Pattern. A big advantage of this is, it is easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. It has now become my go-to library for performing NLP tasks.

On a side note, there is spacy, which is widely recognized as one of the powerful and advanced library used to implement NLP tasks. But having encountered both spacy and TextBlob, I would still suggest TextBlob to a beginner due to its simple interface.

If it is your first step in NLP, TextBlob is the perfect library for you to get hands-on with. The best way to go through this article is to follow along with the code and perform the tasks yourself. So let’s get started!

Note : This article does not narrate NLP tasks in depth. If you want to revise the basics and come back here, you can always go through this article.

About TextBlob?
Setting up the System
Having a go at NLP tasks using TextBlob
1. Tokenization
2. Noun phrase extraction
3. POS-Tagging
4. Words inflection and lemmatization
5. N-grams
6. Sentiment Analysis
Other cool things to do with TextBlob
1. Spelling correction
2. Creating a short summary of a text
3. Translation and language detection
Text classification using TextBlob
Pros and Cons
End notes

1. About TextBlob?

TextBlob is a python library and offers a simple API to access its methods and perform basic NLP tasks.

A good thing about TextBlob is that they are just like python strings. So, you can transform and play with it same like we did in python. Below, I have shown you below some basic tasks. Don’t worry about the syntax, it is just to give you an intuition about how much-related TextBlob is to Python strings.

Code So, to perform these things on your own let’s quickly install and start coding.

2. Setting up the System

Installation of TextBlob in your system in a simple task, all you need to do is open anaconda prompt ( or terminal if using Mac OS or Ubuntu) and enter the following commands:

pip install -U textblob

This will install TextBlob. For the uninitiated – practical work in Natural Language Processing typically uses large bodies of linguistic data, or corpora. To download the necessary corpora, you can run the following command

python -m textblob.download_corpora

3. NLP tasks using TextBlob

3.1 Tokenization

Tokenization refers to dividing text or a sentence into a sequence of tokens, which roughly correspond to “words”. This is one of the basic tasks of NLP. To do this using TextBlob, follow the two steps:

Create a textblob object and pass a string with it.
Call functions of textblob in order to do a specific task.

So, let’s quickly create a textblob object to play with.

from textblob import TextBlob

blob = TextBlob("Analytics Vidhya is a great platform to learn data science. \n It helps community through blogs, hackathons, discussions,etc.")

3.2 Noun Phrase Extraction

Since we extracted the words in the previous section, instead of that we can just extract out the noun phrases from the textblob. Noun Phrase extraction is particularly important when you want to analyze the “who” in a sentence. Lets see an example below.

blob = TextBlob("Analytics Vidhya is a great platform to learn data science.")
for np in blob.noun_phrases:
 print (np)
>> analytics vidhya
great platform
data science

As we can see that the results aren’t perfectly correct, but we should be aware that we are working with machines.

3.3 Part-of-speech Tagging

Part-of-speech tagging or grammatical tagging is a method to mark words present in a text on the basis of its definition and context. In simple words, it tells whether a word is a noun, or an adjective, or a verb, etc. This is just a complete version of noun phrase extraction, where we want to find all the the parts of speech in a sentence.

Let’s check the tags of our textblob.

for words, tag in blob.tags:
 print (words, tag)
>> Analytics NNS
Vidhya NNP
is VBZ
a DT
great JJ
platform NN
to TO
learn VB
data NNS
science NN

Here, NN represents a noun, DT represents as a determiner, etc. You can check the full list of tags from here to know more.

3.4 Words Inflection and Lemmatization

Inflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural.

blob = TextBlob("Analytics Vidhya is a great platform to learn data science. \n It helps community through blogs, hackathons, discussions,etc.")
print (blob.sentences[1].words[1])
print (blob.sentences[1].words[1].singularize())

>> helps
help

TextBlob library also offers an in-build object known as Word. We just need to create a word object and then apply a function directly to it as shown below.

from textblob import Word
w = Word('Platform')
w.pluralize()
>>'Platforms'

We can also use the tags to inflect a particular type of words as shown below.

## using tags
for word,pos in blob.tags:
 if pos == 'NN':
 print (word.pluralize())
>> platforms
sciences

Words can be lemmatized using the lemmatize function.

## lemmatization
w = Word('running')
w.lemmatize("v") ## v here represents verb
>> 'run'

3.5 N-grams

A combination of multiple words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words, and can be used as features for language modelling. N-grams can be easily accessed in TextBlob using the ngrams function, which returns a tuple of n successive words.

for ngram in blob.ngrams(2):
print (ngram)
>> ['Analytics', 'Vidhya']
['Vidhya', 'is']
['is', 'a']
['a', 'great']
['great', 'platform']
['platform', 'to']
['to', 'learn']
['learn', 'data']
['data', 'science']

3.6 Sentiment Analysis

Sentiment analysis is basically the process of determining the attitude or the emotion of the writer, i.e., whether it is positive or negative or neutral.

The sentiment function of textblob returns two properties, polarity, and subjectivity.

Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1].

Let’s check the sentiment of our blob.

print (blob)
blob.sentiment
>> Analytics Vidhya is a great platform to learn data science.
Sentiment(polarity=0.8, subjectivity=0.75)

We can see that polarity is 0.8, which means that the statement is positive and 0.75 subjectivity refers that mostly it is a public opinion and not a factual information.

4. Other cool things to do

4.1 Spelling Correction

Spelling correction is a cool feature which TextBlob offers, we can be accessed using the correct function as shown below.

blob = TextBlob('Analytics Vidhya is a gret platfrm to learn data scence')
blob.correct()
>> TextBlob("Analytics Vidhya is a great platform to learn data science")

We can also check the list of suggested word and its confidence using the spellcheck function.

blob.words[4].spellcheck()
>> [('great', 0.5351351351351351),
 ('get', 0.3162162162162162),
 ('grew', 0.11216216216216217),
 ('grey', 0.026351351351351353),
 ('greet', 0.006081081081081081),
 ('fret', 0.002702702702702703),
 ('grit', 0.0006756756756756757),
 ('cret', 0.0006756756756756757)]

4.2 Creating a short summary of a text

This is a simple trick which we will be using the things we learned above. First, take a look at the code shown below and to understand yourself.

import random

blob = TextBlob('Analytics Vidhya is a thriving community for data driven industry. This platform allows \
people to know more about analytics from its articles, Q&A forum, and learning paths. Also, we help \
professionals & amateurs to sharpen their skillsets by providing a platform to participate in Hackathons.')

nouns = list()
for word, tag in blob.tags:
if tag == 'NN':
nouns.append(word.lemmatize())

print ("This text is about...")
for item in random.sample(nouns, 5):
word = Word(item)
print (word.pluralize())

>> This text is about...
communities
platforms
forums
platforms
industries

Simple, Ain’t it? What we did above that we extracted out a list of nouns from the text to give a general idea to the reader about the things the text is related to.

4.3 Translation and Language Detection

Can you guess what is written in the next line?

Haha! Can you guess which language is this? Don’t worry, let’s detect it using textblob…

blob.detect_language()
>> 'ar'

So, it is Arabic. Now, let’s find translate it into English so that we can know what is written using TextBlob.

blob.translate(from_lang='ar', to ='en')
>> TextBlob("that's cool")

Even if you don’t explicitly define the source language, TextBlob will automatically detect the language and translate into the desired language.

blob.translate(to= 'en') ## or you can directly do like this
>> TextBlob("that's cool")

This is seriously so cool!!! 😀

5. Text classification using TextBlob

Let’s build a simple text classification model using TextBlob. For this, first, we need to prepare a training and testing data.

training = [
('Tom Holland is a terrible spiderman.','pos'),
('a terrible Javert (Russell Crowe) ruined Les Miserables for me...','pos'),
('The Dark Knight Rises is the greatest superhero movie ever!','neg'),
('Fantastic Four should have never been made.','pos'),
('Wes Anderson is my favorite director!','neg'),
('Captain America 2 is pretty awesome.','neg'),
('Let\s pretend "Batman and Robin" never happened..','pos'),
]
testing = [
('Superman was never an interesting character.','pos'),
('Fantastic Mr Fox is an awesome film!','neg'),
('Dragonball Evolution is simply terrible!!','pos')
]

Textblob provides in-build classifiers module to create a custom classifier. So, let’s quickly import it and create a basic classifier.

from textblob import classifiers
classifier = classifiers.NaiveBayesClassifier(training)

As you can see above, we have passed the training data into the classifier.

Note that here we have used Naive Bayes classifier, but TextBlob also offers Decision tree classifier which is as shown below.

## decision tree classifier
dt_classifier = classifiers.DecisionTreeClassifier(training)

Now, let’s check the accuracy of this classifier on the testing dataset and also TextBlob provides us to check the most informative features.

print (classifier.accuracy(testing))
classifier.show_informative_features(3)
>> 1.0
Most Informative Features
            contains(is) = True              neg : pos    =      2.9 : 1.0
      contains(terrible) = False             neg : pos    =      1.8 : 1.0
         contains(never) = False             neg : pos    =      1.8 : 1.0

As, we can see that if the text contains “is”, then there is a high probability that the statement will be negative.

In order to give a little more idea, let’s check our classifier on a random text.

blob = TextBlob('the weather is terrible!', classifier=classifier)
print (blob.classify())
>> neg

So, based on the training on the above dataset, our classifier has provided us the right result.

Note that here we could have done some preprocessing and data cleaning but here my aim was to give you an intuition that how we can do text classification using TextBlob.

6. Pros and Cons

Pros:

Since, it is built on the shoulders of NLTK and Pattern, therefore making it simple for beginners by providing an intuitive interface to NLTK.
It provides language translation and detection which is powered by Google Translate ( not provided with Spacy).

Cons:

It is little slower in the comparison to spacy but faster than NLTK. (Spacy > TextBlob > NLTK)
It does not provide features like dependency parsing, word vectors etc. which is provided by spacy.

7. End Notes

I hope that you that a fun time learning about this library. TextBlob, actually provided a very easy interface for beginners to learn basic NLP tasks.

I would recommend every beginner to start with this library and then in order to do advance work you can learn spacy as well. We will still be using TextBlob for initial prototyping in the almost every NLP project.

You can find the full code of this article from my github repository.

Also, did you find this article helpful? Please share your opinions/thoughts in the comments section below.

Learn, compete, hack and get hired!

shubham.jain

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Gaurav Chandra

Thanks Shubham for providing information on NLP. I am Analytics enthusiast and currently trying to learn the tools available in this area.

raymond doctor

Thanks for the useful article. How do I get Textblob to analyse a full file for POS Tagging or Sentiment analysis?

Deepak

You gave an excellent article.Thank You and All the best.

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Natural Language Processing for Beginners: Using TextBlob

Introduction

Table of Contents

1. About TextBlob?

2. Setting up the System

3. NLP tasks using TextBlob

3.1 Tokenization

3.2 Noun Phrase Extraction

3.3 Part-of-speech Tagging

3.4 Words Inflection and Lemmatization

3.5 N-grams

3.6 Sentiment Analysis

4. Other cool things to do

4.1 Spelling Correction

4.2 Creating a short summary of a text

4.3 Translation and Language Detection

5. Text classification using TextBlob

6. Pros and Cons

Pros:

Cons:

7. End Notes

Learn, compete, hack and get hired!

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect