How To Perform Data Visualization with Pandas

Neelu Last Updated : 16 Oct, 2024

8 min read

This article was published as a part of the Data Science Blogathon

Introduction

Data visualization is the most important step in the life cycle of data science, data analytics, or we can say in data engineering. It is more impressive, interesting and understanding when we represent our study or analysis with the help of colours and graphics. Using visualization elements like graphs, charts, maps, etc., it becomes easier for clients to understand the underlying structure, trends, patterns and relationships among variables within the dataset. Simply explaining the data summary and analysis using plain numbers becomes complicated for both, people coming from technical and non-technical backgrounds. Data visualization gives us a clear idea of what the data wants to convey to us. It makes data neutral for us to understand the data insights.

Data visualization involves operating a huge amount of data and converts it into meaningful and knowledgeable visuals using various tools. For visualizing data we need the best software tools to handle various types of data in structured or unstructured format from different sources such as files, web API, databases, and many more. We must choose the
best visualization tool that fulfils all our requirements. The tool should support interactive plots generation, connectivity to data sources, combining data sources, automatically refresh the data, secured access to data sources, and exporting widgets. All these features allow us to make the best visuals of our data and also save time.

Advantages of Data Visualization:

Easy to understand: Managers and decision-makers use data visualization tools to create plots easily and rapidly consume important metrics. These metrics show the clear cut growth or loss in business. For example, if Sales are
significantly going down in one region, decision-makers will easily find out from the data what circumstances or decisions are at present and how to respond to the factors encountered. Through graphical representations,
we can interpret the vast features of data very clearly and cohesively, which allows us to understand the data and to draw conclusions from those insights and see business outlook.
Quick Decision Making: The Human mind can process visual images faster than texts and numerical values. Hence, seeing a graph, chart, or other visual and graphical representations of data is more pleasant and easy for our
brain to process. To read and grasp text, and then convert this into a visualization of the data that might not be entirely accurate becomes difficult and time consuming to understand for the team of decision-makers. It is a good human ability that easily interprets visual data; data visualization completely proves to improve the speed of decision-making
processes. Data visualization always helps to shorten business meetings and efficient decision making.
Better Analysis: Data visualization plays an important role for business stakeholders to analyze reports of business regarding sales, marketing strategies, and product interest. Better analysis can put our focus on the areas that require more attention to improve the strategies that increase profits and make the business more productive.
Identifying patterns: Huge amount of sophisticated data will give several opportunities for insights after we visualize them. Visualization permits business users to acknowledge relationships and patterns between the data, also providing bigger meaning to it. Exploring these patterns helps users concentrate on specific areas that need attention within the data, to establish the importance of these areas to drive their business forward.
Detecting Errors: Visualizing our data helps quickly determine any errors within the data. If the data tend to counsel the incorrect actions, visualization facilitates detecting inaccurate data sooner so it will be off from the
analysis.
Exploring business insights: Within the current competitive business atmosphere, finding data correlations using visual representations is essential to characterize business insights. Exploring these insights is very important
for business users or executives to line up the correct path to achieving the business goals.
Efficient Storytelling: Data visualizations are acknowledged because of the method of displaying data to produce insights that will support better decisions i.e., telling the story behind the data. It can offer factors, raise and
draw attention to crucial insights and visually beat the other’s business.
Discovery of Latest Trends within the Market: Using data visualization, we’ll be able to discover the most recent trends in your business to produce a quality product and determine issues before they arise. Staying on high of
trends, we’ll be able to place a lot of effort into augmented profits for our business.

Data Visualization with Pandas:

Pandas library in python is mainly used for data analysis. It is not a data visualization library but, we can create basic plots using Pandas. Pandas is highly useful and practical if we want to create exploratory data analysis plots. We do not need to import other data visualization libraries in addition to Pandas for such tasks.

As Pandas is Python’s popular data analysis library, it provides several different functions to visualizing our data with the help of the .plot() function. There is one more advantage of using Pandas for visualization is we can serialize or create a pipeline of data analysis functions and plotting functions. It simplifies the task.

Let’s understand how we can visualize data using Pandas with practical implementation and also all other features.

To visualize the data we will create a DataFrame that has 4 columns consists of random values using the Numpy random.rand() function. The IDE we are using is Google Colab. Let’s create each type of plot one by one.

Creating the Dataframe:

Python Code:

#importing packages
import numpy as np
import pandas as pd

#creating a DataFrame
df = pd.DataFrame(np.random.rand(10, 4), columns=('col_1', 'col_2', 'col_3', 'col_4'))
# Since this is a randomly generated dataframe the values will differ everytime you run this code for everyone.

#displaying the DataFrame
print(df)

display the datframe | Data visualization with pandas

This is the DataFrame which we will use throughout all the visualizations. We are going to use the .plot() function of DataFrame and series to plot graphs. For DataFrame and Series .plot() function is a convenience to plot all of the columns along with labels.

Line plot:

Line plot can be created with DataFrame.plot() function.

df.plot()

line plot | Data visualization with pandas

We have got the well-versed line plot for df without specifying any type of features in the .plot() function. We can plot graphs between two columns also. Let’s see another example:

df.plot(x="col_1", y="col_2")

Line plot between two columns | Data visualization with pandas

In this example, we have plotted a line graph between two columns only by providing the arguments for the x and y-axis.

We can also generate subplots for individual columns. Let’s see an example:

df.plot(subplots=True, figsize=(8, 8));

subplots | Data visualization with pandas

The subplot of the line graph is generated for each column in DataFrame.

Bar plot:

Now, we will create bar plots for the same dataframe. Bar plot can be created with DataFrame.plot.bar() function.

df.plot(kind="bar")

barplot | Data visualization with pandas

We can see that the bar plot is generated for all the columns. Let’s specify some features in the plot.

df.plot.bar(stacked=True);

stacked barplot | Data visualization with pandas

In this bar plot, the bars are stacked.

df.plot.barh(stacked=True);

In this bar plot, the bars are set horizontally.

Histogram plot:

Now, let’s generate a histogram for the df. Histogram plot can be created with DataFrame.plot.hist() function.

df.plot.hist()

Now, let’s create a histogram with some other features.

df.plot.hist(stacked=True, bins=20);

stackedd hitogram | Data visualization with pandas

This is a stacked histogram.

df.plot.hist(orientation="horizontal", cumulative=True);

cumulative histogram | Data visualization with pandas

Here, we have added a cumulative frequency in the histogram.

Let’s create a histogram for each column individually.

df.diff().hist();

The histogram is created for each column in the form of subplots.

Box plot:

Now, we will create box plot. Box plot can be created with DataFrame.plot.box() function or DataFrame.boxplot().

df.plot.box()

boxplot | Data visualization with pandas

Or we can also write like this

df.boxplot()

boxplot 2 | Data visualization with pandas

The above written both the line of code will generate the same box plot.

Now, generating the box plot in a horizontal form.

df.plot.box(vert=False, positions=[1, 2, 3, 4]);

The box plots are generated in the horizontal format.

Area plot:

Now, we will create a area plot. Area plot can be created with DataFrame.plot.area() function.

df.plot.area()

This is the area plot for dataframe df. This plot is stacked.

Now, we will create unstacked area plot.

df.plot.area(stacked=False)

This area plot is unstacked as we have specified in the plot function.

Scatter plot:

Now, let’s generate a scatter plot. A Scatter plot can be created with DataFrame.plot.scatter() function. As we know scatter plot takes two-positional required arguments i.e. x and y to plot the graph. So, we will give the values of the x and y-axis as the name of columns.

df.plot.scatter(x='col_1', y='col_3');

This is the scatter plot between col_1 and col_3 of dataframe df. Let’s apply some styles.

ax=df.plot.scatter(x="col_1", y="col_3", color="red", marker="*", s=100)

df.plot.scatter(x="col_2", y="col_4", color="orange", s=100, ax=ax)

scatter plot 2| Data visualization with pandas

In this plot the data is spread with respect to col_2 and col_4 and the we have added some styles also like color, marker and size of scatters. Let’s see another style of scatter plot

df.plot.scatter(x="col_2", y="col_4", c='col_1', s=100)

scatter plot 3 | Data visualization with pandas

The c keyword is given as the name of a column to provide colours for each point.

Pie chart:

A Pie plot can be created with DataFrame.plot.pie() function or Series.plot.pie(). To generate a pie chart we will create series data as a pie chart is created only for one column. Let’s create a series named pie.

pie = pd.Series(np.random.rand(5))

pie

Now, let’s create a pie chart.

pie.plot.pie();

A Pie chart can be created for DataFrames also but it will generate individual pies for each column of DataFrame in the form of subplots. Let’s Create a pie chart for the dataframe also:

First, we will create a dataframe with three columns and then generate the pie chart for each column.

#creating a DataFrame
df2 = pd.DataFrame(np.random.rand(6, 3),
                   columns=('col_1', 'col_2', 'col_3'))

#displaying the DataFrame
df2

This is the dataframe df2 for the pie chart. Now. let’s generate the chart.

df2.plot.pie(subplots=True, figsize=(15, 15))

subplots pie chart | Data visualization with pandas

These are the subplots of the pie chart for each column in the DataFrame df3.

This brings us to the end of this article. In this article, we discussed how we can visualize data with data analysis library Pandas without importing any additional data visualization library. Hope you enjoyed reading this article. Do let me know your comments and feedback in the comment section.

Thanks for reading.

For more articles click here.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Neelu

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction

Tools

Libraries

Plots

Use cases

How To Perform Data Visualization with Pandas

Introduction

Advantages of Data Visualization:

Data Visualization with Pandas:

Creating the Dataframe:

Line plot:

Bar plot:

Histogram plot:

Box plot:

Area plot:

Scatter plot:

Pie chart:

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV