This article was published as a part of the Data Science Blogathon.
With the advent of social media, a lot of data has been generated and is being generated. This data corresponds to either the opinion of people on political matters, on products they use, or on the services they use from companies. Mining this data and analyzing it will give great insights into what people think about the services of a company or regarding the product they use. One such application is knowing the Tweet Sentiment Visualization of the public over a wide range of topics. This is an important task in Natural Language Processing(NLP). For example, If a company like Paytm wants to know what people think about its product, then it can mine social media, visualize the resulting data and get insights into what people think about its services. Similarly, airline companies can also do the same to know what their customers think about their services and can take measures regarding improving their services to the customers. Building dashboards to see the data in a graphical way can make things easy because we can see the data in the form of graphs and numbers to get a clear picture. Interactive dashboards make things easier because of their interactive features which makes us see many parameters.
There are many dashboarding tools like Powerbi, Tableau, Google Data Studio, etc. Coding our own dashboard using python language will give us an advantage of cost reduction and efficient tracking of KPIs. One such library which can help us create such interactive dashboards is Streamlit. In this project, we will use the Streamlit library to create an interactive dashboard to visualize the sentiment of tweets tweeted by customers who use airlines in the USA.
The dataset that we use in this application can be found here. This is a dataset that was created by scraping the tweets of the Twitter users from February 2015 on various airlines that run in the United States of America. Sentiments of tweets were labeled in the dataset and we will create an interactive dashboard using the data to derive insights from it and deploy it on Huggingface spaces.
Streamlit is an open-source python library to quickly create data science apps. We can also use this to create prototype Machine Learning apps using the models we create. It is a very useful python library to create web interfaces for any application. It also offers a free hosting platform called Streamlit share. We can create a GitHub repository of our application and connect it directly to the streamlit share platform. Streamlit share takes care of everything and deploys our application for us.
Hugging face Spaces is a great way of hosting our machine learning applications to showcase to the community. It is free to host and we can host any number of applications. We can directly copy our code and create a app.py file and requirements.txt file. This platform automatically deploys it for us and we can start using our application. We will use this platform to deploy our application.
Firstly we install all the necessary libraries we need to build this application –
pip install pandas pip install numpy pip install streamlit pip install plotly pip install wordcloud pip install matplotlib
Next we code our application in the following way –
import streamlit as st import pandas as pd import numpy as np import plotly.express as px from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt st.set_option('deprecation.showPyplotGlobalUse', False) DATA_ = pd.read_csv("Tweets.csv") st.title("Sentiment Analysis of Tweets about US Airlines") st.sidebar.title("Sentiment Analysis of Tweets about US Airlines") st.markdown("This application is a streamlit dashboard to analyze the sentiment of Tweets") st.sidebar.markdown("This application is a streamlit dashboard to analyze the sentiment of Tweets") def run(): @st.cache(persist=True) def load_data(): DATA_['tweet_created'] = pd.to_datetime(DATA_['tweet_created']) return DATA_ data = load_data() st.sidebar.subheader("Show random tweet") random_tweet = st.sidebar.radio('Sentiment', ('positive', 'neutral', 'negative')) st.sidebar.markdown(data.query('airline_sentiment == @random_tweet')[["text"]].sample(n=1).iat[0,0]) st.sidebar.markdown("### Number of tweets by sentiment") select = st.sidebar.selectbox('Visualization type', ['Histogram', 'Pie chart']) sentiment_count = data['airline_sentiment'].value_counts() sentiment_count = pd.DataFrame({'Sentiment':sentiment_count.index, 'Tweets':sentiment_count.values}) if not st.sidebar.checkbox("Hide", True): st.markdown("### Number of tweets by sentiment") if select == "Histogram": fig = px.bar(sentiment_count, x='Sentiment', y='Tweets', color='Tweets', height=500) st.plotly_chart(fig) else: fig = px.pie(sentiment_count, values='Tweets', names='Sentiment') st.plotly_chart(fig) st.sidebar.subheader("When and Where are users tweeting from?") hour = st.sidebar.slider("Hour of day", 0,23) modified_data = data[data['tweet_created'].dt.hour == hour] if not st.sidebar.checkbox("Close", True, key='1'): st.markdown("### Tweets locations based on the time of date") st.markdown("%i tweets between %i:00 and %i:00" % (len(modified_data), hour, (hour+1)%24)) st.map(modified_data) if st.sidebar.checkbox("Show Raw Data", False): st.write(modified_data) st.sidebar.subheader("Breakdown airline tweets by sentiment") choice = st.sidebar.multiselect('Pick airline', ('US Airways', 'United', 'American', 'Southwest', 'Delta', 'Virgin America'), key='0') if len(choice) > 0: choice_data = data[data.airline.isin(choice)] fig_choice = px.histogram(choice_data, x='airline', y='airline_sentiment', histfunc = 'count', color = 'airline_sentiment', facet_col='airline_sentiment', labels={'airline_sentiment':'tweets'}, height=600, width=800) st.plotly_chart(fig_choice) st.sidebar.header("Word Cloud") word_sentiment = st.sidebar.radio('Display word cloud for what sentiment?',('positive', 'neutral','negative')) if not st.sidebar.checkbox("Close", True, key='3'): st.header('Word cloud for %s sentiment' % (word_sentiment)) df = data[data['airline_sentiment']==word_sentiment] words = ' '.join(df['text']) processed_words = ' '.join([word for word in words.split() if 'http' not in word and not word.startswith('@') and word !='RT']) wordcloud = WordCloud(stopwords=STOPWORDS, background_color='white', height=640, width=800).generate(processed_words) plt.imshow(wordcloud) plt.xticks([]) plt.yticks([]) st.pyplot() if __name__ == '__main__': run()
Explanation of above code –
Firstly we import all the installed libraries as shown. Next we read the dataset using pandas ‘pd.read_csv()‘. After this title of the app page is given using ‘st.title()‘. A short description of the app is also given using ‘st.markdown()‘. The same thing is done in the sidebar too. In this application, we create five features. They are
So firstly, we create a ‘load_data()‘ function to load the necessary data from the dataset. We also cache this function so that we don’t have to run this function every time we call it. This saves us time and makes our application faster.
After this, we create our first feature ‘Show random tweet’. For this, we create a subheader as shown in code and give three radio buttons one for each the sentiment ‘Positive’, ‘Negative’, ‘Neutral’. Then we give a query command from the dataset to get a random tweet with a selected label.
Next, to visualize the number of tweets by sentiment we create a dataset from the original dataset with only the sentiment labels and then we count the values. Next, we create a selectbox and give two options of Histogram and Pie chart to the plot. We use plotly charts to plot the two graphs.
To find when and where the tweets are coming from we use the map function of streamlit. This will create a map for us and shows us the locations of all the regions from the given dataset. We also create a slider to change the time of tweets. In this way, as shown in the above code we create this feature,
To plot the sentiments by airlines we use plotlychart feature of streamlit. First, we create a multi-select box to select the airlines and then we get the sentiments of those airlines. Then we plot the bar graphs of those sentiments. Finally, we create a Wordcloud by getting all the text of the tweets and plotting them to see the words of the tweets. In this way, we create our application.
So far we created our application. We can deploy this application and interact with it to get insights. For this, we use hugging face spaces to host our application. Go to this website and click create spaces button. You will see a page asking for the name of your repository. Give a name for your app and choose streamlit under the SDK option, choose a license, and click ‘create space’. Then create a new file ‘app.py’ put the code in that file and commit changes. Then create a new file ‘requirements.txt’ and paste the following in that file –
streamlit plotly wordcloud matplotlib
Click commit changes and hugging face spaces takes care of the rest and deploys the application. Here is the link to the application I created. Check this application and see if you have any doubts about the code –
Interactive Tweet Sentiment Visualization Dashboard – a Hugging Face Space by rajesh1729
That’s all folks! Hope you liked my article on Tweet Sentiment Visualization.
We created a NLP sentiment analysis dashboard using streamlit and deployed it on hugging face spaces. These kinds of apps are very useful for business executives to make data-driven decisions. If you have any doubts regarding the code please comment below so that I can answer your queries.
Read more articles on Tweet Sentiment Visualization on our blog.
Image-1 source: Streamlit • The fastest way to build and share data apps
Image-2 source: Spaces – Hugging Face
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.