I am sure many of you have read several articles around the world stating the buzz around “Machine Learning, “Data Scientist”, “Data Visualization” and so on. Some have branded data science as the sexiest job of the 21st century. A report stated by Anaconda’s State of Data Science Report 2020 that 21% of the time is given to Data Visualization. It is important to use a tool or library to help us in the flow of storytelling.
Data visualization is one of the most basic and important steps in predictive modelling. People often start with data visualization to gain more insights and try to understand the data by doing Exploratory Data Analysis (EDA). Making charts and visuals is a better option rather than studying the tables and values as people love visuals rather than boring text or values. So, Let’s make clear, elegant, and insightful charts that our audience can understand easily, considered the audience as a non-technical person always. Less is more impactful, proper visualization brings clarity of data which helps in decision-making. let’s see a quick guide that helps in bokeh visualization.
“Even if your role does not directly involve the nuts and bolts of data science, it is useful to know what data visualization can do and how it is realized in the real world.”
– Ramie Jacobson
Bokeh is an interactive visualization library in python. The best feature which bokeh provides is highly interactive graphs and plots that target modern web browsers for presentations. Bokeh helps us to make elegant, and concise charts with a wide range of various charts.
Bokeh primarily focuses on converting the data source into JSON format which then uses as input for BokehJS. Some of the best features of Bokeh are:
With bokeh, we can easily visualize large data and create different charts in an attractive and elegant manner.
There are plenty of visualization libraries why do we need to use bokeh only? Let’s see why.
We can use the bokeh library to embed the charts on the web page. With bokeh, we can embed the charts on the web, make a live dashboard, and apps. Bokeh provides its own styling option and widgets for the charts. This is the advantage of embedding the bokeh charts on and website using Flask or Django.
Mainly bokeh provides two interface levels that are simple and we can adapt easily.
bokeh.models
Bokeh models provide a low-level interface that provides high-end flexibility to the application developers
bokeh.plotting
Bokeh plotting provides a high-level interface for creating visuals glyphs. Bokeh plotting is a subclass of bokeh.models module. It contains the definition of figure class; figure class is the simplest plot creation.
bokeh.application
Bokeh application package which is used to create bokeh documents; is a lightweight factory.
bokeh.server
Bokeh server is used to publish and share interactive charts and apps.
Installation the bokeh library with pip, run the following command
pip install pandas-bokeh
Installing the bokeh library for the conda environment, run the following command
conda install -c patrikhlobil pandas-bokeh
Importing necessary packages for bokeh library
import pandas as pd import pandas_bokeh from bokeh.io import show, output_notebook from bokeh.plotting import figure pandas_bokeh.output_notebook() pd.set_option('plotting.backend', 'pandas_bokeh')
Bokeh plotting is an interface for creating interactive visuals which we import from the figure
that acts as a container that holds our charts.
from bokeh.plotting import figure
We need the below command to display the charts
from bokeh.io import show, output_notebook
We need the below command to display the output of the charts in the jupyter notebook
pandas_bokeh.output_notebook()
To embed the charts as HTML, run the below command
pandas_bokeh.output_file(filename)
Hovertool is used to display the value when we hover over the data using a mouse pointer and ColumnDataSource is the Bokeh version of DataFrame
from bokeh.models import HoverTool, ColumnDataSource
Using a pandas bokeh
Now, to use the bokeh plotting library for a pandas data frame via the following code.
dataframe.plot_bokeh()
Creating a Figure object for Bokeh
fig = figure() ''' Customizing code for plotting ''' show(fig)
Creating a chart with ColumnDataSource
To use a ColumnDataSource with a render function, we need to at least pass 3 arguments:
Want to show the output chart in a separate HTML file, run the following command
output_file('abc.html')
Bokeh themes have a predefined set of designs that you can apply to your plots. Bokeh gives five built-in themes;
The below picture shows how the charts will look in the build-in themes. Here I have taken a line chart with different themes.
Run the below code for plotting charts using built-in themes.
To enhance the charts there are different properties which we can use. The three main groups of properties that objects have in common:
Basic styling
I will be adding only the required code for customizing the chart, you can add the code as per the requirement. In the end, I will be showing the chart with the demo code for a clear understanding. Well, there are many more properties for detailed explanation see the official documentation.
Adding background color to the chart
fig = figure(background_fill_color="#fafafa")
Setting the value of chart width and height we need to add height and width in figure()
fig = figure(height=350, width=500)
Hiding the x-axis and y-axis of the chart
fig.axis.visible=False
Hiding the grid colors of the chart
fig.grid.grid_line_color = None
To change the intensity of the color of the chart we use alpha
fig.background_fill_alpha=0.3
To add a title to the chart we need to add a title in the figure()
fig = figure(title="abc")
To add or change the x-axis and y-axis labels, run the following command
fig.xaxis.axis_label='X-axis' fig.yaxis.axis_label='Y-axis'
Demo chart for simple stylings
x = list(range(11)) y0 = x fig = figure(width=500, height=250, title='Title', background_fill_color="#fafafa") fig.circle(x, y0, size=12, color="#53777a", alpha=0.8) fig.grid.grid_line_color = None fig.xaxis.axis_label='X-axis' fig.yaxis.axis_label='Y-axis' show(fig)
Steps to create a chart with bokeh.plotting interface is:
Image 2
The data which we are going to work on is Among Us most famous dataset, you can find the dataset on kaggle.
Among Us: New craze
Among Us is the new craze for people playing mobile games that has suddenly exploded it’s popularity and become the hit video game in the pandemic. To all the among Us fans here is a brief description of how the game works. Among Us is a multiplayer game where four and ten players are dropped in an alien spaceship. Each player has its own role of Imposter or Crewmate; the task of crewmate is to run around the spaceship to complete all the tasks which are assigned and also to take care of not being killed by an imposter. Players can be voted off the ship, so each game becomes one of survival.
let’s load the data and create one more feature User ID; user id will tell us which user it is like user 1, user 2, etc.
import glob path = r'D:BlogsAnalytics_vidhyaAmong_Us' all_files = glob.glob(path + "/*.csv") li = [] usr=0 for filename in all_files: usr+=1 df = pd.read_csv(filename, index_col=None, header=0) df['User ID']=usr li.append(df) df = pd.concat(li, axis=0, ignore_index=True) df[:2]
Note: This article doesn’t contain the EDA but shows how to work with different charts in Bokeh
Let’s see the distribution of data.
df.describe(include='O')
We will create a feature Minute and extract data from Game Lenght.
df['Min'] = df.apply(lambda x : x['Game Length'].split(" ")[0] , axis = 1) df['Min'] = df['Min'].replace('m', '', regex=True) df['Min'][:2]
Now, we will replace the values of Murdered features
df['Murdered'].replace(['No', 'Yes', '-'], ['Not Murdered', 'Murdered', 'Missing'],inplace=True)
After completing the necessary steps for cleaning. First, let’s see the basic charts in bokeh.
Let’s check if there are more number of Crewmates or Imposter in the game we have data of a total of 2227 people.
df_team = df.Team.value_counts() df_team.plot_bokeh(kind='pie', title='Ration of Mposter vs Crewmate')
Pie chart in Bokeh
Interpret
As shown in the chart there are 79% of Cremates and 21% of Imposters, this shows that the ratio of Imposter: Crewmates is 1:4. Well, the imposters are less so there are chances that most of the game is won.
Let’s check if there are more Crewmates or Imposter Murdered or not in the game. We will add two more features Angle and Color which we are going to use in the chart. This donut chart was taken from
from math import pi df_mur = df.Murdered.value_counts().reset_index().rename(columns={'index': 'Murdered', 'Murdered': 'Value'}) df_mur['Angle'] = df_mur['Value']/df_mur['Value'].sum() * 2*pi df_mur['Color'] = ['#3182bd', '#6baed6', '#9ecae1'] df_mur
We will use annular_wedge()
to make a donut chart
from bokeh.transform import cumsum fig = figure(plot_height=350, title="Ration of Murdered vs Not Murdered", toolbar_location=None, tools="hover", tooltips="@Murdered: @Value", x_range=(-.5, .5)) fig.annular_wedge(x=0, y=1, inner_radius=0.15, outer_radius=0.25, direction="anticlock", start_angle=cumsum('Angle', include_zero=True), end_angle=cumsum('Angle'), line_color="white", fill_color='Color', legend='Murdered', source=df_mur) fig.axis.axis_label=None fig.axis.visible=False fig.grid.grid_line_color = None show(fig)
Interpret
Most of the people were murdered during the game but most of the data is missing. So we can’t say that the majority of people were murdered during the game.
First, we will create a data frame of Sabotages fixed and Minutes and change the column names and add T in them.
df_min = pd.crosstab(df['Min'], df['Sabotages Fixed']).reset_index() df_min = df_min.rename(columns={0.0:'0T', 1.0:'1T', 2.0:'2T',3.0:'3T',4.0:'4T',5.0:'5T' }) df_min[:2]
df_0 = df_min[['Min', '0T']] df_1 = df_min[['Min', '1T']] df_2 = df_min[['Min', '2T']]
To make a simple scatter plot with only one legend we can pass the data and use scatter()
it to make the charts.
df_min.plot_bokeh.scatter(x='Min', y='1T')
Scatter Chart in Bokeh
To make a scatter chart with more than one legend we need to use the circle; which is a method of figure object. The circle is one of the many plotting styles provided by bokeh you can use a triangle or many more.
fig = figure(title='Sabotages Fixed vs Minutes', tools= 'hover', toolbar_location="above", toolbar_sticky=False) fig.circle(x="Min",y='0T', size=12, alpha=0.5, color="#F78888", legend_label='0T', source=df_0), fig.circle(x="Min",y='1T', size=12, alpha=0.5, color="blue", legend_label='1T', source=df_1), fig.circle(x="Min",y='2T', size=12, alpha=0.5, color="#626262", legend_label='2T', source=df_2), show(fig)
Scatter Chart in Bokeh
Let’s see the distribution of the Minutes of the among Us game. We will use hist
to plot a histogram.
df_minutes = df['Min'].astype('int64') df_minutes.plot_bokeh(kind='hist', title='Distribution of Minutes')
Histogram in Bokeh
Interpret
Most of the games have a period of 6 minutes to 14 minutes.
Let’s see if the game length increases so the imposters and crewmates decrease or increases. We will use a hist
a to make a stacked histogram.
df_gm_te = pd.crosstab(df['Game Length'], df['Team'])
code
df_gm_te.plot_bokeh.hist(title='Gamelegth vs Imposter/Crewmate', figsize=(750, 350))
Stacked Histogram in Bokeh
Interpret
Imposters don’t tend to play the game for a longer time they just want to kill all the cremates and win the game.
Let’s see if the given task is completed by the people or not. If all the tasks are completed then automatically cremates will win.
df_tc = pd.DataFrame(df['Task Completed'].value_counts())[1:].sort_index().rename(columns={'Task Completed': 'Count'}) df_tc.plot_bokeh(kind='bar', y='Count', title='How many people have completed given task?', figsize=(750, 350))
Bar chart in Bokeh
Interpret
The most task completed is 7 and the least completed tasks are 10.
Let’s see who wins: Imposter or Cremate. I have always felt that Imposters won most because they have only one job to kill everyone.
Stacked Bar Chart in Bokeh
Interpret
Imposters are won more often than Crewmates. There is not much difference for Imposter to win or lose the match the values are pretty close. There would be many cases where they have 5 cremates and 4 imposters.
Completing the task will win the game or not let’s see.
df['All Tasks Completed'].replace(['Yes','No'], ['Tasks Completed','Tasks Not Completed'], inplace=True) df2 = pd.crosstab(df['Outcome'], df['All Tasks Completed']) df2.plot_bokeh.barh(title='Completeing task: win or loss', stacked=True, figsize=(650, 350))
Stacked Bar chart in Bokeh
Interpret
Finishing the task will automatically win the cremates. There is a higher number of people who completed the task to win the game.
Let’s see if the Users are Won or Defated with a bi-directional bar chart. To make a bi-directional bar chart we need to make one measure negative, here we will make the Loss feature negative.
df_user = pd.crosstab(df['User ID'], df['Outcome']).reset_index() df_user['Loss'] = df_user['Loss']*-1 df_user['User ID'] = (df_user.index+1).astype(str) + ' User' df_user = df_user.set_index('User ID') df_user[:2]
After completing the above process now, we just need to use barh()
to make a bar chart in both the direction.
df_user.plot_bokeh.barh(title='Users: Won or Defeat')
Interpret
From the chart, we can easily differentiate if the user is Defeated or Won the game.
Let’s see the ejected ratio of cremated from the game. We will use line
to make a line chart.
df_crewmate = df[df['Team'] == 'Crewmate'] df_t_ej = pd.crosstab(df_crewmate['User ID'], df_crewmate['Ejected']).reset_index() df_t_ej = df_t_ej[['No','Yes']] df_t_ej.plot_bokeh.line(title='Cremates Memebers: Ejected vs Minutes', figsize=(750, 350))
Line Chart in Bokeh
Interpret
There is a high variance in members not being ejected from the game.
Let’s visualize the charts for Top 10 Users who win. I have added a user string in all the user id. The data frame looks like this.
df_user_new = pd.crosstab(df['User ID'], df['Outcome']).reset_index().sort_values(by='Win', ascending=False)[:10] df_user_new['User ID'] = (df_user_new.index+1).astype(str) + ' User' df_user_new[:2]
In this chart, we will remove x-axis and y-axis grid lines from the chart. For making a lollipop chart we need to combine segment() and circle().
x = df_user_new['Win'] factors = df_user_new['User ID'] #.values fig = figure(title="Top 10 Users: Win", toolbar_location=None, tools="hover", tooltips="@x", y_range=factors, x_range=[0,75], plot_width=750, plot_height=350) fig.segment(0, factors, x, factors, line_width=2, line_color="#3182bd") fig.circle(x, factors, size=15, fill_color="#9ecae1", line_color="#3182bd", line_width=3) fig.xgrid.grid_line_color = None fig.ygrid.grid_line_color = None show(fig)
Let’s take a look at how many sabotages were fixed over the time period (Minutes). For the simplicity purpose here, we are going to see only two sabotages 0th and 1st.
from bokeh.models import ColumnDataSource from bokeh.plotting import figure, output_file, show # data df_min = pd.crosstab(df['Min'], df['Sabotages Fixed']).reset_index() df_min = df_min.rename(columns={0.0:'0T', 1.0:'1T', 2.0:'2T',3.0:'3T',4.0:'4T',5.0:'5T' }) # chart names = ['0T','1T'] source = ColumnDataSource(data=dict( x = df_min.Min, y0 = df_min['0T'], y1 = df_min['1T'] )) fig = figure(width=400, height=400, title='Sabotages Fied vs Minutes') fig.varea_stack(['y0','y1'], x='x', color=("grey", "lightgrey"),legend_label=names, source=source) fig.grid.grid_line_color = None fig.xaxis.axis_label='Minutes' show(fig)
Interpret
As time increases the sabotages are fixed less.
Till now we saw all the basic charts in bokeh, now let’s see how to work with layouts in bokeh. This will helps us to create a dashboard or an application. So we can have all the information in one place for a particular use case.
The Layout function will let us build a grid of plots and widgets. We can have as many rows and columns or grids of plots in one layout.
There are many layout options available:
Let’s take a dummy data
from bokeh.io import output_file, show from bokeh.layouts import row from bokeh.plotting import figure output_file("layout.html") x = list(range(11)) y0 = x y1 = [10 - i for i in x] y2 = [abs(i - 5) for i in x] # create three plots s1 = figure(width=250, height=250, background_fill_color="#fafafa") s1.circle(x, y0, size=12, color="#53777a", alpha=0.8) s2 = figure(width=250, height=250, background_fill_color="#fafafa") s2.triangle(x, y1, size=12, color="#c02942", alpha=0.8) s3 = figure(width=250, height=250, background_fill_color="#fafafa") s3.square(x, y2, size=12, color="#d95b43", alpha=0.8)
If we use the column() function the output will look like this.
show(column(s1, s2, s3))
If we use the row() function the output will look like this
# put the results in a row and show show(row(s1, s2, s3))
Let’s make a Dashboard layout in Bokeh. Here I have taken three charts one is a lollipop chart, another two are pie charts in bokeh.
The main logic to set a layout in bokeh is how we want to set the charts. Let’s create a design like given in the below picture.
layout = grid([ [fig1], [fig2, fig3] ])
The whole code to run a Dashboard Layout in Bokeh
from bokeh.io import output_file, show from bokeh.plotting import figure from bokeh.layouts import column, grid # 1 layout df_user_new = pd.crosstab(df['User ID'], df['Outcome']).reset_index().sort_values(by='Win', ascending=False)[:10] df_user_new['User ID'] = (df_user_new.index+1).astype(str) + ' User' x = df_user_new['Win'] factors = df_user_new['User ID'] fig1 = figure(title="Top 10 Users: Win", toolbar_location=None, tools="hover", tooltips="@x", y_range=factors, x_range=[0,75], width=700, height=250) fig1.segment(0, factors, x, factors, line_width=2, line_color="#3182bd") fig1.circle(x, factors, size=15, fill_color="#9ecae1", line_color="#3182bd", line_width=3) # 2 layout df_mur = df.Murdered.value_counts().reset_index().rename(columns={'index': 'Murdered', 'Murdered': 'Value'}) df_mur['Angle'] = df_mur['Value']/df_mur['Value'].sum() * 2*pi df_mur['Color'] = ['#3182bd', '#6baed6', '#9ecae1'] fig2 = figure(height=300,width=400, title="Ration of Murdered vs Not Murdered", toolbar_location=None, tools="hover", tooltips="@Murdered: @Value", x_range=(-.5, .5)) fig2.annular_wedge(x=0, y=1, inner_radius=0.15, outer_radius=0.25, direction="anticlock", start_angle=cumsum('Angle', include_zero=True), end_angle=cumsum('Angle'), line_color="white", fill_color='Color', legend_label='Murdered', source=df_mur) # 3 layout df_team = pd.DataFrame(df.Team.value_counts()).reset_index().rename(columns={'index': 'Team', 'Team': 'Value'}) df_team['Angle'] = df_team['Value']/df_team['Value'].sum() * 2*pi df_team['Color'] = ['#3182bd', '#6baed6'] fig3 = figure(height=300, width=300, title="Ration of Cremates vs Imposter", toolbar_location=None, tools="hover", tooltips="@Team: @Value", x_range=(-.5, .5)) fig3.annular_wedge(x=0, y=1, inner_radius=0.15, outer_radius=0.25, direction="anticlock", start_angle=cumsum('Angle', include_zero=True), end_angle=cumsum('Angle'), line_color="white", fill_color='Color', legend_label='Team', source=df_team) # Styling for fig in [fig1, fig2, fig3]: fig.grid.grid_line_color = None for fig in [fig2, fig3]: fig.axis.visible=False fig.axis.axis_label=None layout = grid([ [fig1], [fig2, fig3] ]) show(layout)
In this article, we saw what is bokeh and how to work with different charts, from simple to advanced. We saw how to work around charts in the layout too.
References:
Image 1: https://bokeh.org/branding/
Image 2: https://www.datacamp.com/community/blog/bokeh-cheat-sheet-python