This article was published as a part of the Data Science Blogathon.
A business or a brand’s success depends solely on customer satisfaction. Suppose, if the customer does not like the product, you may have to work on the product to make it more efficient. So, for you to identify this, you will be required to analyze the sentiment of their opinions. Therefore, Sentiment analysis is basically defined as the process of identifying and categorizing opinions from a piece of text, thereby determining whether the opinion is positive or negative.
In this article, we will work on how to perform sentiment analysis using VADER. Sentiment analysis gives meaning to the text. Semantics helps us to interpret symbols, their types, and their relation with each other. Let us briefly understand what actually NLP is and also about NLTK Library.
NLP is an automatic way of manipulating or processing human language. We use NLP to extract meaningful data from textual data. There are various applications of NLP such as Sentiment Analysis, Chatbot, Speech Recognition, Machine Translation, spell checking, Information Extraction, Keyword search, Advertisement matching, etc. Some real-world examples are Google Assistant and Google translate.
Natural Language Toolkit (NLTK) is one of the most powerful NLP libraries which contains packages to make machines understand human language and reply to them in an appropriate desired response. NLTK has many built-in packages to process our textual data at every stage. There are various stages in nltk processing such as data cleaning, visualization, vectorization, etc.
Sentiment analysis is used to find out the polarity of the text, which is positive, negative, or neutral. It is one of the advanced research areas in natural language processing. This is widely used in data mining, text mining, etc. It helps collect and analyze opinions about a brand or a product by processing blog posts, comments, reviews, tweets, etc.
In sentiment analysis, we classify the polarity of a given text at the document, sentence, or feature level. It tells us about the opinion, whether it is positive, negative, or neutral.
Social media monitoring: As we all know, social media is taking over the world. More than 55% of customers share their reviews about purchases socially on many social networking sites. It’s almost difficult to analyze the reviews manually. Sentiment analysis lets us analyze and derive meaning from them.
Brand monitoring: Brand owners use sentiment analysis tools to keep track of the bad reviews about their brand. They can also use machine learning algorithms to predict outcomes based on the results derived using semantic analysis.
Voice of customer: Various sentiment analysis algorithms let us analyze the voice of the customers, such as the product that are most needed by the customers and also the products that are highly rated, etc. The brand owners can create a personalized customer experience based on these evaluations.
Customer service: Chatbots are a widespread way of delivering good customer service. Using sentiment analysis, you can transfer the chat to a customer service associate whenever needed. Also, you can automate the tasks such as booking a ticket, an appointment for a salon, etc.
Market research: Using sentiment analysis, you can research how well your competitors are growing and what are their positive feedbacks from the customers. You can also analyze the way they deal with their customers. You can, in turn, work on the issues related to your product’s failure.
Product Analysis: You can do keyword research to identify the products in demand and the highly rated products. You can also determine what features of a particular product are highly appreciated by the customers or the end users.
VADER( Valence Aware Dictionary for Sentiment Reasoning) is an NLTK module that provides sentiment scores based on the words used. It is a rule-based sentiment analyzer in which the terms are generally labeled as per their semantic orientation as either positive or negative.
First, we will create a sentiment intensity analyzer to categorize our dataset. Then, we use the polarity scores method to determine the sentiment.
In this exercise, I will use a CSV file containing reviews for different products. The link for the file is :
https://drive.google.com/file/d/1NYdZoMJvBWuCejMX28pVRVfMyOe1GhnZ/view?usp=sharing
import numpy as np import pandas as pd import nltk #download vader from nltk nltk.download('vader_lexicon') from nltk.sentiment.vader import SentimentIntensityAnalyzer #creating an object of sentiment intensity analyzer sia= SentimentIntensityAnalyzer() #uploading csv file from google.colab import files uploaded = files.upload() #reading csv file df = pd.read_csv(io.BytesIO(uploaded['reviews.csv'])) df.head()
Polarity_scores: This function returns the sentiment strength based on the given input statement/text.
For example:
text= "Bobby is an amazing guy" sia.polarity_scores(text)
{‘compound’: 0.5859, ‘neg’: 0.0, ‘neu’: 0.513, ‘pos’: 0.487}
You can observe that the above statement is neutral
text= "The food delivered was really very bad" sia.polarity_scores(text)
{‘compound’: -0.6214, ‘neg’: 0.404, ‘neu’: 0.596, ‘pos’: 0.0}
This example statement is a negative one.
Let us now create a new column in our CSV file that stores the polarity scores of each review.
#creating new column scores using polarity scores function df['scores']=df['body'].apply(lambda body: sia.polarity_scores(str(body))) df.head()
Similarly, we then create three different columns each for compound scores, positive scores, and negative scores.
df['compound']=df['scores'].apply(lambda score_dict:score_dict['compound']) df.head() df['pos']=df['scores'].apply(lambda pos_dict:pos_dict['pos']) df.head() df['neg']=df['scores'].apply(lambda neg_dict:neg_dict['neg']) df.head()
We then create a new column named type, which indicates whether the review is pos, neg, or neutral.
df['type']='' df.loc[df.compound>0,'type']='POS' df.loc[df.compound==0,'type']='NEUTRAL' df.loc[df.compound<0,'type']='NEG' df.head()
Finally, we loop through the rows and count the total number of positive, negative, and neutral reviews.
len=df.shape (rows,cols)=len pos=0 neg=0 neutral=0 for i in range(0,rows): if df.loc[i][12]=="POS": pos=pos+1 if df.loc[i][12]=="NEG": neg=neg+1 if df.loc[i][12]=="NEUTRAL": neutral=neutral+1 print("Positive :"+str(pos) + " Negative :" + str(neg) + " Neutral :"+ str(neutral))
Positive :46060 Negative :13670 Neutral :8256
Therefore, using the VADER module, we concluded that our data has 46060 positive reviews, 13670 negative reviews, and 8256 neutral reviews.
Finally, as you all know, social media is taking over the world, and more than 55% of customers share their opinions or reviews about their purchases. Analyzing the semantics of the reviews would have given you a glimpse of how sentiment analysis is done using the concepts of NLP. As we have discussed in our article, there are many other applications of sentiment analysis beyond this.
In this article,
I hope this information helped you understand what sentiment analysis is and how it is done practically.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.