The waterfall chart, often referred to as Floating Bricks or Flying Bricks Charts, is a unique 2-Dimensional visualization. It serves as a powerful tool to analyze incremental positive and negative changes across time or multiple steps. As Anthony T. Hincks humorously notes, waterfalls can take on diverse forms. In this article, we delve into the significance of waterfall charts and demonstrate their creation using libraries like Matplotlib and Plotly.
This article was published as a part of the Data Science Blogathon.
The waterfall chart is frequently used in financial analysis to understand the positive and negative effects of multiple factors over a particular asset. The chart can show the effect based on either time based or category based. Category based charts represent gain or loss over expense or sales or any other variable having sequentially positive and negative values. Time based charts represent the gain or loss over the time period.
The waterfall chart is mostly in a horizontal manner. They start from the horizontal axis and are connected by a series of floating columns which are related to negative or positive comments. Sometimes the bars are connected with lines in the charts.
Let’s take an example to understand when and where to use waterfall charts because making waterfall charts is not a big problem. We will take some dummy data and the Kaggle dataset to build a waterfall chart.
If I give you a table in pandas not a normal one but a stylish one and a waterfall chart, which one is more convenient to read? Tell me?
This table represents the data for the sales for the whole one week and I have used the seaborn library to create heatmaps with the background_gradient
import seaborn as sns
# data
a = ['mon','tue','wen','thu','fri','sat','sun']
b = [10,-30,-7.5,-25,95,-7,45]
df2 = pd.DataFrame(b,a).reset_index().rename(columns={'index':'values',0:'week'})
# table
cm = sns.light_palette("green", as_cmap=True)
df2.style.background_gradient(cmap=cm)
Now, look at the table and waterfall chart side by side.
The table is showing the importance of values in order but it is quite difficult to read the values. But on the other hand, you can easily see that the yellow bar shows the decrement and the red bar shows the incremernt.
The data which we are going to use it is taken from Kaggle of Netflix Movies and TV Shows the notebook can be found here.
We are going to use Plotly, an open source charting library.
import plotly.graph_objects as go
df = pd.read_csv(r'D:/netflix_titles.csv')
Adding year and month and converting into proper date time format
df["date_added"] = pd.to_datetime(df['date_added'])
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df.head(3)
Let’s prepare the data
d2 = df[df["type"] == "Movie"]
col = "year_added"
vc2 = d2[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)
Now we will make a waterfall chart with Plotly trace go.Waterfall(). Now we are going to make a waterfall chart for Movies over the years.
fig2 = go.Figure(go.Waterfall(
name = "Movie", orientation = "v",
x = ["2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
textposition = "auto",
text = ["1", "2", "1", "13", "3", "6", "14", "48", "204", "743", "1121", "1366", "1228", "84"],
y = [1, 2, -1, 13, -3, 6, 14, 48, 204, 743, 1121, 1366, -1228, -84],
connector = {"line":{"color":"#b20710"}},
increasing = {"marker":{"color":"#b20710"}},
decreasing = {"marker":{"color":"orange"}},
))
Let’s go through each parameter one by one:
To make the charts elegant we will be giving the colors to bars of the charts and their connector line too. For increasing bars, I have given red color and for decreasing bars, it is yellow color.
The parameters for the charts:
As we see the chart it looks pretty good but let’s make it more attractive:
fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
margin=dict(t=80, b=20, l=50, r=50),
hovermode="x unified",
xaxis_title=' ', yaxis_title=" ",
plot_bgcolor='#333', paper_bgcolor='#333',
title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
font=dict(color='#8a8d93'))
Now it looks perfect.
Let’s look at the parameters now.
The Full code:
d2 = df[df["type"] == "Movie"]
col = "year_added"
vc2 = d2[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)
fig2 = go.Figure(go.Waterfall(
name = "Movie", orientation = "v",
x = ["2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
textposition = "auto",
text = ["1", "2", "1", "13", "3", "6", "14", "48", "204", "743", "1121", "1366", "1228", "84"],
y = [1, 2, -1, 13, -3, 6, 14, 48, 204, 743, 1121, 1366, -1228, -84],
connector = {"line":{"color":"#b20710"}},
increasing = {"marker":{"color":"#b20710"}},
decreasing = {"marker":{"color":"orange"}},
))
fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
margin=dict(t=80, b=20, l=50, r=50),
hovermode="x unified",
xaxis_title=' ', yaxis_title=" ",
plot_bgcolor='#333', paper_bgcolor='#333',
title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
font=dict(color='#8a8d93'))
Importing the waterfallcharts library using pip:
!pip install waterfallcharts
Importing the library:
import pandas as pd
import waterfall_chart
import matplotlib.pyplot as plt
%matplotlib inline
Let’s plot a waterfall chart for Each week’s sales data:
a = ['mon','tue','wen','thu','fri','sat','sun']
b = [10,-30,-7.5,-25,95,-7,45]
waterfall_chart.plot(a, b);
If we look closely at the charts the bars having positive values are in green, negative values are in red and total value is in blue by default.
Adding some parameters to the chart
waterfall_chart.plot(a, b, net_label='Total', rotation_value=360)
parameters of the chart:
In conclusion, the waterfall chart is invaluable in understanding the intricate dynamics of incremental changes. Its ability to visually represent positive and negative shifts over time or steps offers clarity in various scenarios. Whether tracking financial performance or analyzing project progress, the waterfall chart brings insights to the forefront. As you delve deeper into data visualization, consider expanding your skills with our BlackBelt program. This advanced program empowers you to master waterfall charts and many other data visualization techniques, enhancing your proficiency in analytics.
A. A waterfall chart visualizes incremental changes in a total value, displaying the impact of positive and negative contributions over time or steps. It helps in understanding how different factors contribute to the final result.
A. An example of a waterfall chart could be tracking a company’s annual profit. It would show the initial profit, followed by positive factors like increased sales and cost reductions, and negative factors like expenses, resulting in the final profit.
A. Yes, Excel offers the option to create waterfall charts. It’s a popular tool for generating this type of visualization, allowing users to display the cumulative effects of various values.
A. In data visualization, a waterfall chart represents the cumulative impact of sequentially introduced positive and negative values on an initial point. It’s used to depict how different factors contribute to an outcome, making it easier to comprehend complex changes.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.