We all love exploring data. Data scientists’ major work is to represent data and interpret or extract important information, which is called exploratory data analysis. There are many different representations to show data. One of the important diagrams is a Bar Plot which is widely used in many applications and presentations. This tutorial will teach us to implement and understand a bar plot in Python.Also, We are providing informati regarding bar graph python , how to implementing it and how bar plot in matplotlib is works. So with this tutorial you will clear all your thoughts bar plot in matplotlib or any query regarding python bar chart.
In this article, you will learn how to create a Python bar plot using Matplotlib. We’ll explore how to generate a bar chart in Python and visualize data effectively. By the end, you’ll be able to plot a bar graph and understand the nuances of bar charts in Python programming.
Learning Objectives
A bar graph, a graphical representation of data, employs rectangles to emphasize specific categories. The length and height of the bars depict the dataset distribution. One axis represents a category, while the other represents values or counts. Bar plot in Python, commonly used for this purpose, enable visualizing data either vertically or horizontally. The vertical version is often termed a column chart. Organize these bar charts from high to low counts to create Pareto charts, which provide clear insights into the significance of different categories.
Histograms, a valuable data representation tool, vividly illustrate continuous data distribution. They serve as graphical depictions of frequency distributions for both continuous and discrete datasets. Through the allocation of data into bins or intervals, histograms provide a natural means to visualize the count of data points within each bin, offering insights into data distribution patterns. Some common applications of histograms include understanding data variability, identifying outliers, and assessing the overall shape of the dataset. Explore histograms and their applications seamlessly by using techniques like creating a bar plot in Python, which enhances the natural representation of data distribution. Some common uses of histograms include:
A bar plot represents categorical data, dividing it into distinct categories. Each category has a separate bar, with the height indicating the frequency or count of data points in that category.
So, the choice between a histogram and a bar plot in Python depends on the type of data you are working with. You have continuous data, a histogram is an appropriate choice. If you have categorical data, a bar plot is an appropriate choice. If you have ordinal data that can be ordered, such as star ratings or levels of education, consider using a bar plot.
Here are some common use cases for bar plots:
In python, we use some libraries to create bar plots. They are very useful for data visualizations and interpreting meaningful information from datasets.
Here is a Code you can check here:
Here are some Python libraries we use to create a bar chart.
Matplotlib is a maths library widely used for data exploration and visualization. It is simple and provides us with the API to access functions like the ones used in MATLAB. The Matplotlib bar() function is the easiest way to create a bar chart. We import the library as plt and use:
plt.bar(x, height, width, bottom, align)
The code to create a bar plot in matplotlib:
The bar width in bar charts can be controlled or specified using the “width” parameter in the bar() function of the Matplotlib library. The “width” parameter determines the width of each bar in the bar chart. For example, to set the bar width to 0.8, you can write the following code:
import numpy as np
import matplotlib.pyplot as plt
# Dataset generation
data_dict = {'CSE':33, 'ECE':28, 'EEE':30}
courses = list(data_dict.keys())
values = list(data_dict.values())
fig = plt.figure(figsize = (10, 5))
# Bar plot
plt.bar(courses, values, color ='green',
width = 0.5)
plt.xlabel("Courses offered")
plt.ylabel("No. of students enrolled")
plt.title("Students enrolled in different courses")
plt.show()
You can also use the np.arange() function or the np.linspace() function to create numpy arrays, which can be plotted. You can also use the plt.subplots() function to create multiple plots in the same Python figure.
Reference: https://matplotlib.org/
Seaborn is also a visualization library based on matplotlib and is widely used for presenting data. We can import the library as sns and use the following syntax:
seaborn.barplot(x=' ', y=' ',data=df)
The code to create a bar chart in seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('tips')
sns.barplot(x = 'time',y = 'total_bill',data = df)
plt.show()
Plotly is an amazing visualization library that has the ability to interactive presentations, zoom into locations, and handle various design representations. Use this for readable representations, and hover over the chart to read the represented data. It also serves higher-dimensional data representation and abstracts data science and machine learning visualizations. We use plotly.express as px for importing plotly.
px.bar(df, x=' ', y=' ')
The following is code to create a bar chart in Plotly:
import plotly.express as px
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()
Unstacked bar plots compare a specific category over time using different samples. You can deduce insights from the patterns observed through this comparison. In the figure below, we can see the players’ ratings over the years in FIFA. We can see that Django and Gafur have increased in ratings over the years. This shows us their progression, so a club can now decide if they want to sign Django or Gafur.
import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind="bar",figsize=(15, 8))
plt.title("FIFA ratings")
plt.xlabel("Footballer")
plt.ylabel("Ratings")
You can also use the Pandas read_csv() function to import data in a CSV file format into a Pandas dataframe for plotting.
As the name suggests, stacked bar charts/plots have each plot stacked one over them. As we saw earlier that we used an unstacked bar chart to compare each group; we can use a stacked plot to compare each individual. In pandas, this is easy to implement using the stacked keyword.
import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind='bar', stacked=True,figsize=(15, 8))
plt.title("FIFA ratings")
plt.xlabel("Footballer")
plt.ylabel("Ratings")
Now, let us apply these syntaxes to a dataset and see how we can plot bar charts using different libraries. For this, we will use the Summer Olympics Medal 1976- 2008 dataset and visualize it using bar graphs to generate univariate, bivariate, and multivariate analysis and interpret relevant information from it.
The Summer Olympics dataset from 1976 to 2008 is available here.
In exploratory data analysis, Univariate analysis refers to visualizing one variable. In our case, we want to visualize column data using a bar plot.
All-time medals of top 10 countries:
top_10 = df['Country'].value_counts()[:10]
top_10.plot(kind='bar',figsize=(10,8))
plt.title('All Time Medals of top 10 countries')
The graph shows the top 10 countries that have won Olympic medals. The USA has dominated in olymics over the years.
Medals won by the USA in Summer Olympics:
indpie = df[df['Country']=='United States']['Medal'].value_counts()
indpie.plot(kind='bar',figsize=(10,8))
We filter the country to the USA and visualize the medals won by the USA alone.
The bivariate analysis includes two variables or two columns from our dataset.
Total athletes’ contribution to Summer Olympics over time:
plt.figure(figsize=(10, 5))
sns.countplot(df['Year'])
plt.title('Total Athletes contribution in summer olympics over time')
plt.xlabel('Years')
plt.ylabel('No. of Athlete')
Over the years there has been an increase in the participation of athletes in the Olympics.
Top 10 athletes with the most awarded medals:
athlete_order = df['Athlete'].value_counts().head(10).index
plt.figure(figsize=(9, 5))
sns.countplot(data=df, y='Athlete', order=athlete_order)
plt.title('Top 10 Athletes with the most awarded Medals')
plt.xlabel('No. of awrded medals')
plt.ylabel('Athlete Name');
This plot is also called the horizontal bar chart, and here we can see Micheal has won the most medals in the Olympics. This bar graph has the top 10 athletes.
Sports with most awarded medals:
plt.figure(figsize=(15, 5))
highest_sport = df['Sport'].value_counts().index
sns.countplot(data=df, x='Sport', order=highest_sport)
plt.xticks(rotation=75)
plt.title('Sports with most awarded Medals')
plt.xlabel('Sport')
plt.ylabel('No. of Medals')
Aquatics has contributed to the most number of medals in the Olympics. One thing to note in this graph is that we have used a rotation of 75 to text.
Type of medals won over the years:
sns.countplot(x='Year',hue='Medal',data=df)
sns.set(rc={'figure.figsize':(10,10)})
plt.title("Type of medals won over the years")
The graph is an unstacked barplot and shows medal grouping over each year.
Medals by gender:
sns.countplot(x="Medal", hue="Gender", data=df)
The gender bar plot tells us that men have participated more in the Olympics, or we can see men category games have been more in the Olympics.
The gender ratio in Summer Olympics:
gender_group = df.groupby(['Year', 'Gender']).size().unstack()
gender_group.apply(lambda x:x/x.sum(), axis=1).plot(kind='barh', stacked=True, legend=False)
plt.legend(['Men', 'Women'], bbox_to_anchor=(1.0, 0.7))
plt.xlabel('Men / Women ratio')
The data tells us the ratio of men to women in the Olympics over the years. Here, we can see that more games have started including the women’s category, which is a great sign.
Medals by gender in each discipline:
sns.countplot(y='Discipline',hue='Gender',data=df)
sns.set(rc={'figure.figsize':(10,10)})
plt.xticks(rotation=90)
plt.title('Medals by Gender in each Discipline')
plt.legend(loc=1) # 1 is code for 'upper right'3
This graph shows each gender’s participation in the specific discipline.
The scatter plot is another type of plot for bivariate data visualization of numerical data where the x-axis and y-axis represent the values of two different data points.
Multivariate analysis is used when we want to compare more than two categories. Usually, a boxplot is a good representation, as shown here.
sns.catplot(x="Medal", y="Year", hue="Gender",kind="box", data=df)
This graph shows us that in all three medals, men have been winning moreover women in the Olympics over the year.
Use bar plots to visualize time series data by placing time on the x-axis and values on the y-axis. In this case, each bar represents a single data point, with its height indicating the value and its position along the x-axis showing when you recorded the data point.
To plot time series data as a bar plot in Python, you can use the bar() function of the Matplotlib library. First, you need to convert the time series data into a suitable format, such as a list or numpy array, that can be passed to the bar() function. Then, you can use the “xticks” parameter in the bar() function to specify the x-axis labels, representing the time values in the time series data. If you want, you can also customize the y-axis using the y-ticks params.
For example, to plot a time series data with time values in the format “YYYY-MM-DD” and data values as integers, you can write the following Python code:
import matplotlib.pyplot as plt
import pandas as pd
# Example time series data
time = ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04']
data = [100, 120, 130, 140]
# Create a pandas dataframe from the time series data
df = pd.DataFrame({'Time': time, 'Data': data})
# Plot the time series data as a bar plot
plt.bar(df['Time'], df['Data'], width=0.8)
plt.xticks(rotation=90)
plt.show()
The above code will create a bar plot with the time values along the x-axis and the data values along the y-axis. You can customize the appearance of the plot by adjusting various parameters, such as the width of the bars, the color of the bars, and the labels for the x and y axes.
In this article, we have gone through different implementations of bar plots in Python and understand the different types of plots we use for exploratory data analysis using bar graphs.When creating a bar plot in Python, it is important to choose an appropriate type based on the data’s nature and the information you want to communicate.
Hope you like the article! Creating a Python bar plot with Matplotlib allows you to visualize data effectively. Use bar charts to compare values easily. A bar graph in Python is simple to implement and provides clear insights.
Elevate your data visualization skills and master data science techniques by enrolling in our Data Science Black Belt program. Gain hands-on experience and personalized mentorship—enroll today to start your journey to becoming a data science expert!
A. We can graph a bar graph in python using the Matplotlib library’s “bar()” function.
A. Some of the most common types of bar plots in Python are:
1. Simple bar plot: A bar plot representing a single data set, where each bar represents a single data point.
2. Grouped bar plot: A bar plot representing multiple sets of data, where each group represents a separate data set.
3. Stacked bar plot: A bar plot shows multiple sets of data, where the height of each bar represents the sum of values for each data set.
4. Horizontal bar plot: Rotate a bar plot 90 degrees to the left, placing the x-axis vertically and the y-axis horizontally.
Error bar plot: A bar plot that includes error bars representing the data’s uncertainty.
A. Commonly used Python objects for plotting a bar graph are lists, Numpy arrays, and Pandas dataframes.
A. matplotlib.pyplot.bar is a Python function that creates a vertical bar chart to display data. This function belongs to the matplotlib library and typically helps users visualize categorical data. The bar function accepts parameters like the x-coordinates of the bars, the height of the bars, and the width of the bars. Users can customize it with various optional parameters, including colors, labels, and titles. Overall, matplotlib.pyplot.bar is a useful tool for creating clear and informative visualizations of data.
A bar plot is a type of chart used to visualize categorical data. It displays rectangular bars, with the height or length of each bar representing the value of a specific category. Python libraries like Matplotlib and Seaborn are commonly used to create bar plots.
This is a great tutorial on how to create bar plots in Python. I found it helpful and easy to follow.
This is a great tutorial on how to create bar plots in Python. I found it helpful and easy to follow.