Text Analysis app using Spacy, Streamlit, and Hugging face Spaces

UPPU RAJESH KUMAR Last Updated : 15 Mar, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Introduction

Text Analysis is a way of extracting meaningful and useful information from unstructured textual data. It is very useful in various fields and is a rapidly growing domain in the field of Natural Language Processing(NLP). It’s basically aimed at extracting machine-readable information to enable a data-driven decision-making process. It also helps us in managing content. Using text analysis we can get rid of human errors while making decisions and also we can be as accurate as possible. For example, if a product manager of an e-commerce website wants to know the public review of his products, ideally, he/she must go through all the reviews posted by the customer and then come to a conclusion regarding the feedback. This is a very time-consuming process and is vulnerable to human errors. If the person reading the reviews misunderstands or misreads, then there is a chance that wrong decisions are made. But, using text analysis we can get this work done within very less time and with high accuracy. We can get sentiments, extract keywords, names, or company information or categorize surveys or product reviews based on their sentiment and topic. The different text analysis techniques that are commonly used are –

Text Classification
Text Extraction
Word Frequency
Collocation
Concordance
Word Sense Disambiguation
Clustering

Text Classification aims to assign a predefined tag or category to the unstructured textual data. Some of the most important text classification tasks are sentiment analysis, topic modeling, language detection, and intent detection.

Text Extraction aims to extract a piece of data that is already present in the data. Some of the important text extraction tasks are keyword extraction, named entity recognition. These are useful in identifying relevant information.

Word Frequency aims to measure the most frequently occurring words in a given text using TF-IDF. We can use this to know the most frequent words that customers use while chatting with a customer support executive or even in the case of reviewing product reviews.

Collocation calculates the words that commonly co-occur with each other. Bi-grams and Tri-grams are the types of collocation that help us find the hidden semantic structure.

A concordance helps us to find the instances and context of words. Word Sense Disambiguation helps us to find the words that have more than one meaning. Clustering enables us to group texts with common attributes as a cluster. In this way, text analysis helps us to find the qualitative aspects of a given text.

In this app, we use Text Classification and Text Extraction techniques to analyze the given sentence. More specifically we use Sentiment analysis, Named Entity Recognition, and Subjectivity. Subjectivity gives us the measure of to what extent a given sentence is opinionated.

Overview

Spacy
Spacy TextBlob
Streamlit
Hugging Face Spaces
Building the application
Deployment
Conclusion

Spacy

Spacy is an open-source python library used for all kinds of Natural Language Processing(NLP) tasks and is widely used in the industry. It offers industry-grade scalable features and is very robust. In this app that we are going to build, we shall use the Named Entity Recognition(NER) of the Spacy library.

Spacy TextBlob

Spacy TextBlob is a component of the Spacy library that enables us to do sentiment analysis. We get sentiment aka polarity of the given sentence and also we get the subjectivity of the sentence. This uses the TextBlob library under the hood to get the results.

Image-1

Streamlit

Streamlit is an open-source python library that is used to build web apps. This can be used to quickly build ML web apps, Data visualization dashboards. This library is easy to learn and anyone can quickly pick up their skills for building user interfaces for their ML apps. We shall use this library to build our web app.

image-2

Hugging face Spaces

Hugging face Spaces is a great way of deploying our machine learning web apps quickly. It offers to host an unlimited number of apps on its servers free of cost. In this project, we will host our app on hugging face spaces.

Hugging faces spaces | Text Analysis — image-3

Building the Application

Firstly, we will install all the necessary libraries as follows –

pip install spacy
pip install spacytextblob
pip install streamlit

Next, we code our application as follows –

import streamlit as st
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
st.set_page_config(layout='wide', initial_sidebar_state='expanded')
st.title('Text Analysis using Spacy Textblob')
st.markdown('Type a sentence in the below text box and choose the desired option in the adjacent menu.')
side = st.sidebar.selectbox("Select an option below", ("Sentiment", "Subjectivity", "NER"))
Text = st.text_input("Enter the sentence")
@st.cache
def sentiment(text):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(text)
    if doc._.polarity<0:
        return "Negative"
    elif doc._.polarity==0:
        return "Neutral"
    else:
        return "Positive"
@st.cache
def subjectivity(text):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(text)
    if doc._.subjectivity > 0.5:
        return "Highly Opinionated sentence"
    elif doc._.subjectivity < 0.5:
        return "Less Opinionated sentence"
    else:
        return "Neutral sentence"
@st.cache
def ner(sentence):
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(sentence)
    ents = [(e.text, e.label_) for e in doc.ents]
    return ents
def run():
    if side == "Sentiment":
        st.write(sentiment(Text))
    if side == "Subjectivity":
        st.write(subjectivity(Text))
    if side == "NER":
        st.write(ner(Text))
if __name__ == '__main__':
    run()

Explanation of the above code –

As a first step, we import necessary libraries.

Next we set our application page configuration using ‘st.set_page_config()‘. After this, we give the title of our app page using ‘st.title()‘ and write a short description of what our app does, using the ‘st.markdown()‘. Then we create a sidebar for our application to show the user options using the ‘st.sidebar.selectbox()‘ and give three options for our three text analysis operations as ‘Sentiment’, ‘Subjectivity’ and ‘NER’.

We need to take text input from the user. So we do that using ‘st.text_input()‘. Now we need to create three functions to do three text analysis operations as we wanted. The first function is the sentiment function. We use spacy textblob to find the sentiment of the given text. Here we do a slight modification of the sentiment because spacy textblob gives a polarity score of the text ranging from -1 to 1. If the polarity score is negative then it is ‘Negative’ sentiment. If the polarity score is zero then sentiment is ‘Neutral’ and if the polarity score is positive then the sentiment is ‘Positive’. In this way, we create the Sentiment function as shown in the above code block. We cache this function using ‘@st.cache‘ so that there won’t be any need to re-run the function every time we run the app and this increases the speed of the app.

Similarly, we define the Subjectivity function using spacy textblob. Since subjectivity scores range between 0 and 1 we mark the sentence as highly opinionated if the score is above 0.5 and we mark the sentence as less opinionated if the score is below 0.5 and as a neutral sentence, if the score is equal to 0.5. Next, we create the Named Entity Recognition(NER) function using spacy to get the named entities.

Finally, we create our run function to run the app using all the functions we created. If the user inputs a text and selects the Sentiment option in the sidebar then the sentiment function runs and displays the sentiment. If the user selects the subjectivity option then the subjectivity function runs and displays the result as programmed. Similarly, if the user selects the NER option then the ner function runs and displays the named entities of that text.

Deployment

We created our app. It’s time to deploy it using hugging face spaces. Go to this website and create an account. After creating an account click ‘create space’. Then you can see subsequent pages asking for names for your app and tech stack. Give the desired name, select an appropriate license, select Streamlit under the SDK option and finally click create. After this, you will see a page with instructions to clone your GitHub repo and push it to spaces. Alternatively, you can create a repo within spaces. Here you need to create a ‘requirements.txt’ file. Paste the below content in the requirements text file.

spacy
spacytextblob
https://huggingface.co/spacy/en_core_web_sm/resolve/main/en_core_web_sm-any-py3-none-any.whl

After pasting the text in the file click commit changes and spaces starts building your app. Your app is finally built and ready for use.

I have already created a text analysis app as described in this article.

Please check it out here – Text Analysis With Spacy And Streamlit – a Hugging Face Space by rajesh1729

Conclusion

We have created a simple text analysis app and deployed it on hugging face spaces. These kinds of apps are very useful for the eCommerce industry, customer service industry, etc., If you have any doubts regarding the above code please comment below so that I can clear your doubts.

Interested to read Hindi Text Analysis? Head on to our blog.

image-1 source: spaCyTextBlob · spaCy Universe

image-2 source: Streamlit • The fastest way to build and share data apps

image-3 source: Spaces – Hugging Face

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

UPPU RAJESH KUMAR

Data Science Enthusiast. Interested in NLP, computer vision.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Text Analysis app using Spacy, Streamlit, and Hugging face Spaces

Introduction

Overview

Spacy

Spacy TextBlob

Streamlit

Hugging face Spaces

Building the Application

Deployment

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#