“Visualization gives you answers to questions you didn’t know you had.” – Ben Shneiderman
My day-to-day work as a Data Scientist requires a great deal of experimentation. That means I rely a lot on data visualization to explore the dataset I’m working on.
And I couldn’t relate more to Ben Shneiderman’s quote! Data visualization gives me answers to questions I hadn’t even considered before. After all, a picture is worth a thousand data points!
This naturally leads to the million-dollar question – which Python library should you use for data visualization? There are quite a few across the board. Even seasoned data scientists can get lost in the myriad sea of features that each Python library has to offer.
That’s why I wanted to write this article espousing the advantages and unique features of the different data visualization Python libraries. We will cover some of the most amazing libraries for visualization that Python supports. Each of these libraries possesses its own flair and is really useful for a particular kind of visualization task.
So without much ado, let’s start!
If you’re new to Python and/or data visualization, I suggest checking out the below resources by Analytics Vidhya:
Chances are you’ve already used matplotlib in your data science journey. From beginners in data science to experienced professionals building complex data visualizations, matplotlib is usually the default visualization Python library data scientists turn to.
matplotlib is known for the high amount of flexibility it provides as a 2-D plotting library in Python. If you have a MATLAB programming background, you’ll find the Pyplot interface of Matplotlib very familiar. You’ll be off with your first visualization in no time at all!
Matplotlib can be used in multiple ways in Python, including Python scripts, the Python and iPython shells, Jupyter Notebooks and what not! This is why it’s often used to create visualizations not just by Data Scientists but also by researchers to create graphs that are of publication quality.
Matplotlib supports all the popular charts (lots, histograms, power spectra, bar charts, error charts, scatterplots, etc.) right out of the box. There are also extensions that you can use to create advanced visualizations like 3-Dimensional plots, etc.
What I personally like about matplotlib is that because it’s so flexible, it lets the user control aspects of the visualization at the most granular level, from a single line or dot in the graph to the entire chart. This means you can customize it at the highest levels.
Here are some useful tutorials to learn matplotlib:
Here’s Matplotlib’s creator giving an introductory tutorial:
When I look at visualizations built by Seaborn, only one word comes to mind – beautiful! Seaborn is built on top of matplotlib and provides a very simple yet intuitive interface for building visualizations. When using Seaborn, you will also notice that many of the default settings in the plots work quite well right out of the box.
The first unique feature of Seaborn is that it is designed in such a way that you write way lesser code to achieve high-grade visualizations. Here is an example of this simplicity. Notice how we can create a complex visualization with just a single line of plotting code:
The second useful feature of Seaborn is that it supports a plethora of advanced plots like categorical plotting (catplot), distribution plotting using kde (distplot), swarm plot, etc. right out of the box. And of course, we saw one example of relplot above.
Now, because Seaborn is built on top of matplotlib, it is highly compatible with it. So that means when building visualizations, you can start with advanced plots that seaborn already supports and then customize them as much as you want with the help of matplotlib.
Here are some helpful resources that you can utilize to start using the seaborn library for data visualization:
Bokeh is a library designed to generate visualizations that are friendly on the web interface and browsers. And that’s what this visualization library specifically targets.
You will also notice that the visualizations generated from Bokeh are interactive in nature, which basically means you can convey information in a more intuitive way through your plots.
Bokeh supports unique visualizations like Geospatial plots, Network graphs, etc. right out of the box. If you want to show these visualizations in a browser, there are options available to export them and you can also use it through JavaScript itself!
Here is a nice tutorial to learn Bokeh for data visualization:
Altair is a declarative library for data visualization. Its principle is that rather than focusing on the code part, one should focus on the visualization part and write as less code as possible and still be able to create beautiful and intuitive plots. That’s right down my alley!
Since Altair uses a declarative style to create plots, it becomes very easy and quick to iterate through visualizations and experiments at a rapid pace when using this library.
Here is a good introduction to Altair in Python:
The first thing that comes to my mind when I think about Plotly is interactivity! This data visualization library is by far my go-to library whenever I want to create visualizations that need to be highly interactive for the user.
Just check out this visualization created using Plotly:
Plotly is highly compatible with Jupyter Notebook and Web Browsers. This means whatever interactive plots you create can easily be shared in the same manner with your teammates or end-users.
I also want to point out that Plotly supports a gamut of plots right from basic chart types, Seaborn-like beautiful and advanced plots, 3-D plots, Map-based visualizations, scientific plots, etc. The list is endless!
Plotly’s plots can also support animation capabilities as well. So, it’s a pretty useful library if you want to do storytelling through visualizations.
Here are a couple of tutorials to get you up and running with Plotly for data visualization:
ggplot is the Python version of the famous ggplot2 of R and the Grammer of Graphics language. If you have used it in R before, you will know just how simple it is to create plots using this library.
I personally love the flexibility of ggplot. We can easily wrangle data while building plots on the fly – a super useful concept!
ggplot is also a declarative style library like Bokeh but is also tightly coupled with Pandas. This means you can easily build visualizations using your Pandas dataframe itself!
You can learn more about ggplot and how to work with it here:
In this article, we explored some of the must-know libraries for performing data visualization in Python. Each of these libraries is quite popular in its own right and shines out in different scenarios.
I hope this article will be like a rosetta stone when you are going to decide which library to use for your next project.
Do you think any other data visualization library should be on this list? Did you like the article? If yes, comment below!
Very Interesting and informative piece of content. I really love to read such an informative content i would also recommend to read blog datavisualizationgurus.com such an amazing website.