Data visualization, a pivotal element in data science, simplifies complex information, allowing quick comprehension and pattern recognition. Charts and graphs make data analysis effortless, aiding in strategic decisions and business growth. Versatile in application, they support sales forecasts, stock analysis, project management, and more. Visuals, like technology, are integral to modern life, enhancing our understanding of the world. Our eyes are adept at extracting insights from visuals, which communicate efficiently and captivate attention. While spreadsheets house data, visual representations transform it into accessible, meaningful narratives. In this article, we will be looking at creating interactive plots using Plotly python.
This article was published as a part of the Data Science Blogathon
Data visualization is the representation of data through the use of common graphics, such as charts, plots, infographics, and even animations. Findings and visuals of data need to be properly presented as well. Data visuals must attract the attention of the audience and convey the appropriate message. It doesn’t matter which department or sector; good data analysis and visualization are great things and will be in high demand in the days to come. Whether in finance, marketing, sales, technology, engineering, research, or human resources, data analysis and visualization will be more important in the coming days. Data must convey a specific message and explain things clearly; good visuals often make the process easy and simple.
Machine Learning helps in making predictions, doing predictive analytics, and other tasks; similarly, good visualizations help in data exploration.
There are various types of data visualizations, and all of them have different purposes and needs.
Data that has a timestamp associated with it is considered to be time-series data. Examples of such data can be stock prices, sales data over time, rainfall and temperature at a place along with the time, road traffic at a particular place, and so on. Such data helps in tracking trends and changes over time. We might want to analyze stock prices or see which day of the week traffic is highest, and so on. So, we check the data trends over time.
Data is highly related. The sales of a supermarket are highly dependent on the vehicular traffic on the road in front of the supermarket. The number of hours of study in the case of students might lead to higher test scores and so on. So, data visualizations must be such that analysts and users can be aware of relationships in data.
The count and frequency of items are important, and we need to keep track of them. We need to keep track of how frequently something happens. Many data visualizations help in determining frequency.
Many data visualizations are made to analyze the distribution of a particular variable and see any importance, risk or value the data might hold. Data can be tested with various metrics and plotted to understand all the important parameters. Effective data visualization is a crucial step in analytics.
The amount of data and information on the internet is increasing day by day. Our every action online is stored as data. Be it our website login, our purchase, an uber trip taken, or online food delivery – everything is tracked. We can confirm that the age of Big Data is almost here. The vast amount of data needs processing and analysis. Data has to be made more understandable, readable, and interpretable. The real-life applications and uses of good data visualization methods are immense. A data-driven organization leverages data to make efficient decisions and will surely perform better than an organization that does not leverage data.
The advent of large data storage facilities has made data available for various purposes. Large organizations like Google, Facebook, Amazon, etc., leverage their data for a wide variety of purposes. Improved business decisions are a direct outcome of data-driven decision-making.
Let us consider a hypothetical scenario. A teacher has the marks of all students in her class. Along with it, she has data for students’ past marks and other grades. All of this data is, however, in spreadsheets. Now, she wants to analyze from her data which students are performing the best, which students’ exam performance has improved, and which students’ performance has decreased, and so on. All this might be possible with spreadsheets, but the amount of effort that needs to be given is too high.
Thankfully, excel has in-built data visualization tools, and the data can be analyzed simply and easily. The teacher can easily check all the data, find out who had the highest score, etc. Data Analysis tools are there to help us in such aspects. Nowadays, we are at liberty to use Excel, Power BI, and Tableau as no-code solutions, and we can also use Python and R if we want custom solutions and data pipelines. These tools serve the purpose of processing the data and making our desired visuals. The use of such tools helps us in automating the data visualization process.
Data tells its tale: properly presented data can explain a lot of things. One of the first known graphs of statistical data was made by Dutch astronomer Michael Florent van Langren. Napoleon Bonaparte’s Russian campaign of 1812 was mapped by Charles Joseph Minard. He used statistical graphs to map the campaign, and he combined multiple metrics: the number of troops, temperature, distance, directions, and more to make a proper visual.
Over time, data visualization kept on improving. The advent of computers and displays meant that data could be processed and presented efficiently. Data Visualization tools and software can analyze vast amounts of data at a very high speed.
Python is an excellent tool for data visualization. Python has been around for a long time and can be used for a wide variety of tasks. It can be used for statistical analysis, machine learning, deep learning, web development, and so on. The easy-to-use nature of Python and a lot of libraries make Python useful for complex numeric and scientific calculations. The uses of Python are immense. The popularity of using Python is also increasing.
Python is open-source and free to use, and there are a lot of libraries and support available for Python. Python can also be used on many platforms. It has many support forums and helps available all over the internet. The great community support for Python and the large number of resources make learning Python for data analysis a great investment. It is flexible, scalable, and has a wide range of libraries and regular updates available. This can lower the data analysis budget and costs. Purchasing licenses for Power BI and Tableau can be expensive in earlier stages. The libraries in Python are constantly evolving, making the process easier and simpler. The data-oriented packages in Python can speed up and simplify the entire data process.
The data analysts’ toolbox can have many tools: Power BI, Tableau, Excel, R, etc., but Python must also be a part of it. The hyper flexibility of Python makes it very useful, and it is highly popular among data analysts and data scientists. Python has many IDEs and environments where data can be visualized. One can use Google Colab, Kaggle Kernel, Jupyter Notebooks, and so on. The graphical options in Python make using Python very easy for data analysis. Python is evolving constantly, is multi-featured, and is highly functional.
Python started as a general-purpose programming language. But, the improved readability of Python made it a good tool for data analysis.
One of the best tools for data analysis is Matplotlib. It is used for 2-dimensional data analysis and basic plotting, charting, and data representation. It was introduced in 2002 by John Hunter. The introduction of Matplotlib propelled the growth of Python as a tool for research, data analysis, and engineering. The visuals are easy to plot and interpret. For example, you can plot bar graphs, line charts, etc.
Seaborn is a great visualization library in Python used for plotting statistical models and complex relations among data. It can plot complex plots like Heatmaps, Relational Plots, Categorical Plots, Regression Plots, etc. Seaborn made complex data analysis and visualization easy and simple to execute.
Now, if we consider the limitations of Seaborn and Matplotlib, first of all, they are static plots. The plots are produced as images, and they are not interactive. We cannot hover our cursor over the plots and get exact values. We cannot also use them to make interactive plots on websites. A good solution to all this is using Plotly.
Plotly is a Montreal-based AI and Analytics company. They focus on the development of Analytics tools, mainly Dash and Chart Studio. They have also released the free and open-source plotting library “Plotly” for Python, R, MatLab, and Julia.
Plotly produces interactive graphs, can be embedded on websites, and provides a wide variety of complex plotting options. The graphs and plots are robust, and a wide variety of people can use them. The visuals are of high quality and easy to read and interpret.
Plotly makes a wide variety of charts, including basic and statistical charts, maps, 3D charts, subplots, and more.
I have prepared and kept the syntax in a Kaggle Notebook, I will leave the link for GitHub later. Please refer to it later so that you are able to understand it. First, we import the necessary libraries.
import numpy as np
import pandas as pd
import plotly.express as px
Now, we read some data we will be using.
The two datasets used here are:
Both datasets are good beginner datasets, with a lot of information and data fields. The Melbourne Housing data has various real estate data points and deals with the housing sector. The data pertains to the housing and commercial property sector.
The superstore data concerned with sales and the retail sector. Various aspects of sales and retail are present in the data.
Now, we proceed with reading the data.
melb= pd.read_csv("/kaggle/input/melbourne-housing-snapshot/melb_data.csv")
sales=pd.read_csv("/kaggle/input/sales-forecasting/train.csv")
The Melbourne data is a bit large. For the sake of simplicity, we are taking only 1000 data points from the dataset.
melb=melb[0:1000]
Scatterplots are a great way to analyze data distribution and the relation between various data fields. Various trends in data can be analyzed and plotted on the x-axis and y-axis. Plotting scatter plots with Plotly is very easy.
x=[0, 1, 2, 3, 4, 5, 6]
y=[0, 2, 4, 5, 5.5, 7, 9]
fig = px.scatter(x, y)
fig.show()
The good thing about Plotly is that the plots are interactive. We can hover over the plots and see exact data values and other information. I will share the link to the notebook, where you can have a look. Also, do upvote the Kaggle Notebook if you like it.
We take the iris dataset now.
Let us make a scatter plot to understand the data distribution.
# importing the library
import plotly.express as px
#we take the iris dataset now
df = px.data.iris()
# let's have a look at the data once
print(df.head())
Making some changes to the parameters.
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='petal_width')
fig.show()
Adding some styles to the plots.
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='species')
fig.show()
Start plotting some data using the Melbourne dataset.
fig = px.scatter(df, y="petal_length", x="petal_width", color="species", symbol="species")
fig.update_traces(marker_size=10)
Adding some columns to the plots.
fig = px.scatter(melb, x="Lattitude", y="Longtitude", marginal_x="histogram", marginal_y="rug",color="Type")
fig.show()
Now, let’s change the parameters.
fig = px.scatter(melb, x="Price", y="YearBuilt", color="Type", facet_col="Rooms", )
fig.show()
We will now change the parameters.
fig = px.scatter(melb, x="Price", y="YearBuilt", color="Rooms", facet_col="Type", )
fig.show()
fig = px.scatter(melb, x="BuildingArea", y="Distance", color="Rooms", facet_col="Type", )
fig.show()
fig = px.scatter(melb, x="BuildingArea", y="Distance", color="Car", facet_col="Type", )
fig.show()
As we can see, all the plots in Plotly are really nice and well-designed. All the colors are great to look at and see.
Regarding scatterplots, we can also make Linear Regression plots using Plotly. We take the dips dataset, and we plot the linear relationship between total bills and tips.
#linear regression
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()
We can see that the linear plot is quite well made, and all the plots are interactive.
Check the Kaggle notebook here: Link
Line plots are great for visualizing continuous data. Time series data, mathematical functions, etc., are some of the data which can be plotted using Line Plots. They reveal data trends, maxima, and minima. We can use them for time series data like stocks, sales over time, and so on. It is a great way to plot a 2D relationship.
Let us use a line plot to plot a mathematical function.
x = np.linspace(0, 10, 1000)
y= 3*x**2 - 2*x**2 + 4*x- 5
fig = px.line(x=x ,y =y,labels={'x':'x', 'y':'y'})
fig.show()
The plot is interactive, so we can hover over it to understand the values.
Now, let us plot a sin() function.
x = np.linspace(0, 10, 1000)
y= np.sin(x)
fig = px.line(x=x ,y =y,labels={'x':'x', 'y':'sin(x)'})
fig.show()
Now, we shall plot some time series data, starting with some stock data.
“MSFT” is the stock symbol for Microsoft.
df = px.data.stocks()
fig = px.line(df, x='date', y="MSFT")
fig.show()
Now, I will include more stocks in the plot.
GOOG stands for Google, FB stands for Facebook, and AMZN stands for Amazon.
df = px.data.stocks()
fig = px.line(df, x='date', y=["MSFT","GOOG",'FB',"AMZN"])
fig.show()
We can see that all the plots are visually appealing and look nice with contrasting colors.
Now, we use some data from the Plotly library for some sample plotting.
df = px.data.gapminder().query("continent == 'Oceania'")
Let us check what the data looks like.
df.head()
We plot the data on a line plot now.
fig = px.line(df, x='year', y='pop', color='country')
fig.show()
We can see that the plot card also shows the data and other parameters on a convenient line plot. Now, add some markers so that the data is easily visible.
fig = px.line(df, x='year', y='pop', color='country',markers=True)
fig.show()
The plot has been made!
Now, a plot with different types of visuals will be made.
import plotly.graph_objects as go
#combined plots
N=100
random_x = np.linspace(0, 5, N)
random_y0 = np.random.randn(N) + 5
random_y1 = np.random.randn(N)
random_y2 = np.random.randn(N) - 5
fig = go.Figure()
# Add traces
fig.add_trace(go.Scatter(x=random_x, y=random_y0,
mode='lines+markers',
name='lines+markers'))
fig.add_trace(go.Scatter(x=random_x, y=random_y1,
mode='markers',
name='markers'))
fig.add_trace(go.Scatter(x=random_x, y=random_y2,
mode='lines',
name='lines'))
fig.show()
Such a type of plot is called a combined plot.
Such combined plots are a great way to understand the data from different perspectives.
Barplots are used to provide a straightforward comparison of data. They represent categorical data with rectangular bars of variable height. Plotting bar charts in a graphing library like Plotly is very easy and simple. Let us start by plotting the population of Australia over time.
df = px.data.gapminder().query("country == 'Australia'")
fig = px.bar(df, x='year', y='pop')
fig.show()
Let us work on the sales data we had taken earlier. But, for the sake of simplicity, we take only the initial 100 data points.
sales=sales[0:100]
fig = px.bar(sales, x="State", y="Sales")
fig.show()
It also individually shows the sales figure of each sale.
Now, we analyze the sales category, and for that, we bring in another parameter.
fig = px.bar(sales, x="State", y="Sales",color='Category')
fig.show()
Now, we plot the sales of each category and add a parameter to distinguish segments.
fig = px.bar(sales, x="Category", y="Sales",color='Segment')
fig.show()
fig = px.bar(sales, x="Category", y="Sales",color="Segment",pattern_shape="Segment", pattern_shape_sequence=[".", "x", "+"])
fig.show()
Now, let us add hues and more advanced colour interpretations to a plot. These improve the readability of the plot.
data = px.data.gapminder()
data_canada = data[data.country == 'Canada']
fig = px.bar(data_canada, x='year', y='pop',
hover_data=['lifeExp', 'gdpPercap'], color='lifeExp',
labels={'pop':'population of Canada'}, height=400)
fig.show()
We can clearly see here that with time, the population of Canada has increased, and also has the life expectancy. Better healthcare, improved medicines, and increased quality of life lead to this.
The hue of life expectancy becomes brighter, as shown in the color bar to the right.
Now, let us check the GDP per capita.
fig = px.bar(data_canada, x='year', y='pop',
hover_data=['lifeExp', 'gdpPercap'], color='gdpPercap',
labels={'pop':'population of Canada'}, height=400)
fig.show()
The GDP per capita improved over time, and we can take that as an indication that general life quality improved with time.
Let us make some stacked bar charts. One important thing to be considered while plotting and data representation are that we need to understand when to plot which data, and which data is important and when. Choosing the right type of chart is very important. This prevents any visualization mistakes.
Let us take into consideration some new data.
df = px.data.gapminder().query("continent == 'Oceania'")
df.head()
fig = px.bar(df, x='year', y='pop',barmode='stack',color='country')
fig.show()
Stacked bar charts show the summation of individual entries as well as the entire plot. So, it is a good way to understand the contribution of each individual factor toward a complete entity.
Let us see the life expectancy data.
fig = px.bar(df, x='year', y='lifeExp',barmode='stack',color='country')
fig.show()
Now we will see custom visuals.
x = ['Suzuki', 'Honda', 'Tata']
y = [100, 40, 60]
# Use the hovertext kw argument for hover text
fig = go.Figure(data=[go.Bar(x=x, y=y,
hovertext=['50 % Share', '20 % Share', '30 % Share'])])
fig.update_layout(title_text='Sales Data')
fig.show()
Let us plot the populations of the most populous nations in Asia.
#uniform text size
df = px.data.gapminder().query("continent == 'Asia' and year == 2007and pop > 8000000")
fig = px.bar(df, y='pop', x='country', text='pop')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.show()
So, we plotted a wide variety of bar plots and analyzed data. Let us try a different type of plot now.
Pie charts are used to understand the composition of data and analyze part-to-whole relationships in data. Piecharts ( and doughnut charts) plot the percentage composition of a value as compared to the entire data/value.
Let us take into consideration the sales dataset again. We plot a pie chart of the sales from each state. The percentage contribution of each state will get plotted. This will show many valuable insights.
fig = px.pie(sales, values='Sales', names='State', title='Sales Per State in US')
fig.show()
So, we can see that the majority of the sales are from California.
Now, we plot the sales segments and their contribution.
Now, we see the sales per category.
fig = px.pie(sales, values='Sales', names='Category', title='Sales Per Category in US')
fig.show()
So, we can see that Furniture was sold the highest.
Now, we will make some more advanced plots, and we shall be using the tips dataset.
#setting colours
df = px.data.tips()
fig = px.pie(df, values='tip', names='day', color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()
So, the plots are entirely customizable.
labels = ['Apple','Microsoft','Amazon','Alphabet']
values = [2252, 1966, 1711, 1538]
fig = go.Figure(data=[go.Pie(labels=labels, values=values, textinfo='label+percent',
insidetextorientation='radial'
)])
fig.show()
Let us make a doughnut chart now.
#donut chart
labels = ['CAR','BIKE','BUS','TRAIN']
values = [1500, 2500, 6800, 9000]
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.show()
The real difference between a doughnut chart and a pie chart is mainly the appearance and the way someone wants to plot the data.
Let us now make the chart a little bit customised.
#donut chart
labels = ['CAR','BIKE','BUS','TRAIN']
values = [1500, 2500, 6800, 9000]
fig = go.Figure(data=[go.Pie(labels=labels, values=values, pull=[0.1, 0.1, 0.2, 0.1])])
fig.show()
So, we can see that Plotly offers a high level of customization and visually appealing plots.
Check out the code here: Kaggle
These Charts are a great way to show magnitude by adjusting the size of the circle. Bubble Charts can be easily made in Python.
fig = go.Figure(data=[go.Scatter(
x=[1, 2, 3, 4], y=[10, 12, 15, 16],
mode='markers',
marker_size=[20, 40, 50, 60])
])
fig.show()
The plot is made easily.
df = px.data.gapminder()
fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent",
hover_name="country", log_x=True, size_max=60)
fig.show()
Let us use the tips data again.
fig = px.scatter(tips, x="total_bill", y="size", size="tip", color="tip",
size_max=20)
fig.show()
Bubble charts are a great way to visualise data and understand insights.
Dot Plots are a different way of presenting scatter plots and showing the data distribution properly.
We are taking a new dataset.
stud= pd.read_csv("/kaggle/input/students-performance-in-exams/StudentsPerformance.csv")
I will share the link to all codes in the end; please have a look there.
fig = px.scatter(stud, x="math score", y="parental level of education", color="gender",
title="Student Performance in Exams"
)
fig.show()
Let us try another plot.
fig = px.scatter(stud, x="writing score", y="parental level of education", color="lunch",
title="Student Performance in Exams"
)
fig.show()
Horizontal bar charts are just a way to interpret the traditional bar chart.
fig = px.bar(stud, x="reading score", y="parental level of education",color='gender', orientation='h')
fig.show()
Gantt Chart is a special type of bar chart that shows the progress of a project or work. Different sections of a bigger project can be plotted based on their timelines and progress.
Let us plot some sample Gantt Charts.
df = pd.DataFrame([
dict(Task="Development", Start='2012-01-20', Finish='2012-02-20'),
dict(Task="Website Design", Start='2012-01-10', Finish='2012-01-30'),
dict(Task="Deployment", Start='2012-02-20', Finish='2012-03-30'),
dict(Task="Marketing", Start='2012-02-25', Finish='2012-04-15')
])
fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task")
fig.update_yaxes(autorange="reversed")
fig.show()
Let us add a few more features.
df = pd.DataFrame([
dict(Task="Development", Start='2012-01-20', Finish='2012-02-20', Team="Team A"),
dict(Task="Website Design", Start='2012-01-10', Finish='2012-01-30', Team="Team B"),
dict(Task="Deployment", Start='2012-02-20', Finish='2012-03-30', Team="Team A"),
dict(Task="Marketing", Start='2012-02-25', Finish='2012-04-15', Team="Team C")
])
fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task", color="Team")
fig.update_yaxes(autorange="reversed")
fig.show()
Now, let us add hues based on team size.
df = pd.DataFrame([
dict(Task="Development", Start='2012-01-20', Finish='2012-02-20', Team="Team A",Team_Size=20),
dict(Task="Website Design", Start='2012-01-10', Finish='2012-01-30', Team="Team B",Team_Size=15),
dict(Task="Deployment", Start='2012-02-20', Finish='2012-03-30', Team="Team A",Team_Size=20),
dict(Task="Marketing", Start='2012-02-25', Finish='2012-04-15', Team="Team C",Team_Size=32)
])
fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task",color="Team_Size")
fig.update_yaxes(autorange="reversed")
fig.show()
Box Plots are a great way to understand data distribution. They depict numerical data using quartiles.
fig = px.box(stud, y="math score")
fig.show()
The minimum on a box plot shows the lowest data point except for some of the outliers.
The maximum shows the largest numerical data point.
The median is the middle value of the data distribution.
Then, the lower quartile is the 25 percentile, and the upper quartile is the 75 percentile.
Let us try some customized box plots.
fig = px.box(stud, x='gender',y="math score")
fig.show()
fig = px.box(stud, x='gender',y="math score", points="all")
fig.show()
fig = px.box(stud, x='gender',y="math score", color="test preparation course")
fig.show()
Now, let us add a notch.
fig = px.box(stud, x='gender',y="math score", color="test preparation course", notched=True)
fig.show()
Histogram widgets are an excellent plot to understand the frequency distribution of numerical data.
fig = px.histogram(stud, x="math score", nbins=20, color="gender")
fig.show()
Let us customize it.
fig = px.histogram(stud, x="math score", nbins=20, color="gender", marginal="rug")
fig.show()
Let us make a data visual to show the proper representation of data by adding a box plot as well.
fig = px.histogram(stud, x="reading score", y="math score", color="gender", marginal="box",
hover_data=stud.columns)
fig.show()
Such visuals are really great for understanding how the data is spread, and we can interact with the plots.
fig = px.histogram(stud, x="reading score", y="writing score", color="parental level of education", marginal="box",
hover_data=stud.columns)
fig.show()
We had a look at major visualization methods in Plotly.
Code (Kaggle Notebooks):
Plotly Python library is an open-source module that is used for data visualization and supports various graphs like line charts, scatter plots, bar charts, histograms, area plots, etc. Plotly Python library produces interactive graphs, can be embedded on websites, and provides a wide variety of complex plotting options. The graphs and plots are robust, and a wide variety of people can use them. The interactivity also offers a number of advantages over static matplotlib plots, such as saving time when initially exploring your dataset.
A. The Plotly Python library is an open-source plotting library that covers statistical, financial, geographic, scientific, and 3D use cases.
A. Plotly has several advantages. One of the main advantages of interactive plots.
A. We can access this API in python using the plot.ly package. To install, open up a terminal and type $ pip install plotly or you can type $ sudo pip install Plotly.
This is a great guide for using Plotly in Python! I'm a big fan of the platform and this guide makes it even more useful.
can you please also show how to store static plotly image locall?
This guide is really helpful! I'm a beginner in Python and this guide has helped me a lot.