Descriptive Statistics: Definitions, Types and Examples

Illiyas Last Updated : 14 Jan, 2025

12 min read

The first step of any data-related process is the collection of data. Once we have collected the data, what do we do with it? Data can be sorted, analyzed, and used in various methods and formats, depending on the project’s needs. While analyzing a dataset, We use statistical methods to arrive at a conclusion. Data-driven decision-making also depends on how efficiently we use these methods. Two types of statistical methods are widely used in data analysis: descriptive and inferential. This article will focus more on descriptive statistics, its types, calculations, examples,percentages etc.

This article was published as a part of the Data Science Blogathon.

What is Descriptive Statistics?
Types of Statistics
What is Inferential Statistics?
Types of Descriptive Statistics
Descriptive Statistics Based on the Central Tendency of Data
Descriptive Statistics Based on the Dispersion of Data
Descriptive Statistics Based on the Shape of the Data
Univariate Data vs. Bivariate Data in Descriptive Statistics
What are the 10 commonly used descriptive statistics?
Can Descriptive Statistics be used to make inferences or predictions?
Frequently Asked Questions

What is Descriptive Statistics?

Descriptive statistics serves as the initial step in understanding and summarizing data. It involves organizing, visualizing, and summarizing raw data to create a coherent picture. The primary goal of descriptive statistics is to provide a clear and concise overview of the data’s main features. This helps us identify patterns, trends, and characteristics within the data set without making broader inferences.

Key Aspects of Descriptive Statistics

Measures of Central Tendency: Descriptive statistics include calculating the mean, median, and mode, which offer insights into the center of the data distribution.
Measures of Dispersion: Variance, standard deviation, and range help us understand the spread or variability of the data.
Visualizations: Creating graphs, histograms, bar charts, and pie charts visually represent the data’s distribution and characteristics

Types of Statistics

When you delve into the world of statistics, you’ll encounter two fundamental branches: descriptive statistics and inferential statistics. These two distinct approaches help us make sense of data and draw conclusions. Let’s look at the differences between these two branches to shed light on their roles in the realm of statistical analysis and their total number of branches.

Aspect	Descriptive Statistics	Inferential Statistics
Purpose	Summarize and describe data	Draw conclusions or predictions
Data Sample	Analyzes the entire dataset	Analyzes a sample of the data
Examples	Mean, Median, Range, Variance	Hypothesis testing, Regression
Scope	Focuses on data characteristics	Makes inferences about populations
Goal	Provides insights and simplifies data	Generalizes findings to a larger population
Assumptions	No assumptions about populations	Requires assumptions about populations
Common Use Cases	Data visualization, data exploration	Scientific research, hypothesis testing

What is Inferential Statistics?

Inferential statistics takes data analysis to the next level by drawing conclusions about populations based on a sample. It involves making predictions, generalizations, and hypotheses about a larger group using a smaller subset of data. Inferential statistics bridges the gap between our data and the conclusions we want to reach. This is particularly useful when obtaining data from an entire population is impractical or impossible.

Key Aspects of Inferential Statistics

Sampling Techniques: Inferential statistics relies on carefully selecting representative samples from a population to make valid inferences.
Hypothesis Testing: This process involves setting up hypotheses about population characteristics and using sample data to determine if these hypotheses are statistically significant.
Confidence Intervals: These provide a range of values within which we’re confident a population parameter lies based on sample data.
Regression Analysis: Inferential statistics also encompass techniques like regression analysis to model relationships between variables and predict outcomes.

Now we will look at descriptive statistics in detail.

Types of Descriptive Statistics

There are various dimensions in which this data can be described. The three main dimensions used for describing data are the central tendency, dispersion, and the shape of the data. Now, let’s look at them in detail, one by one.

Descriptive Statistics Based on the Central Tendency of Data

The central tendency of data is the center of the distribution of data. It describes the location of data and concentrates on where the data is located. The three most widely used measures of the “center” of the data are Mean, Median, and Mode.

Mean

The “Mean” is the average of the data. The average can be identified by summing up all the numbers and then dividing them by the number of observations.

Mean = X₁+ X₂+ X₃ +… +X_n / n

Example:

Data – 10,20,30,40,50 and Number of observations = 5
Mean = [ 10+20+30+40+50 ] / 5
Mean = 30

The central tendency of the data may be influenced by outliers. You may now ask, ‘What are outliers?‘ Well, outliers are extreme behaviors. An outlier is a data point that differs significantly from other observations. It can cause serious problems in analysis.

Example:

Data – 10,20,30,40,200
Mean = [ 10+20+30+40+200 ] / 5
Mean = 60

Solution for the outliers problem: Removing the outliers while taking averages will give us better results.

Median

It is the 50th percentile of the data. In other words, it is exactly the center point of the data. Neural networks identify the median by ordering the data, splitting it into two equal parts, and then finding the number in the middle. It is the best way to find the center of the data.

Note that, in this case, the central tendency of the data is not affected by outliers.

Example:

Odd number of Data – 10,20,30,40,50
Median is 30.
Even the number of data – 10,20,30,40,50,60

Find the middle 2 data and take the mean of those two values.
Here, 30 and 40 are middle values.

Now, add them and divide the result by 2
30+40 / 2 =35
Median is 35

Mode

The mode of the data is the most frequently occurring data or elements in a dataset. If an element occurs the highest number of times, it is the mode of that data. If no number in the data is repeated, then that data has no mode. There can be more than one mode in a dataset if two values have the same frequency, which is also the highest frequency.

Outliers don’t influence the data in this case. The mode can be calculated for both quantitative and qualitative data.

Example:

Data – 1,3,4,6,7,3,3,5,10, 3
Mode is 3, because 3 has the highest frequency (4 times)

Descriptive Statistics Based on the Dispersion of Data

The dispersion is the “spread of the data”. It measures how far the data is spread. In most of the dataset, the data values are closely located near the mean. The values in some other datasets spread widely from the mean. You can measure these dispersions of data using the Interquartile Range (IQR), range, standard deviation, and variance.

dispersion of data descriptive statistics

Let us see these measures in detail.

Inter Quartile Range (IQR)

Quartiles are special percentiles.
1st Quartile Q1 is the same as the 25th percentile.
2nd Quartile Q2 is the same as 50th percentile.
3rd Quratile Q3 is same as 75th percentile

Steps to find quartile and percentile

The data should sorted and ordered from the smallest to the largest.
For Quartiles, ordered data is divided into 4 equal parts.
For Percentiles, ordered data is divided into 100 equal parts.

The Inter Quartile Range is the difference between the third quartile (Q3) and the first quartile (Q1)

IQR = Q3 – Q1

In this example, the Inter Quartile range is the spread of the middle half (50%) of the data.

Range

The range is the difference between the largest and the smallest value in the data.

Standard Deviation

The most common measure of spread is the standard deviation. The Standard deviation measures how far the data deviates from the mean value. The standard deviation formula varies for population and and highest value of sample. Both formulas are similar but not the same.

Symbol used for Sample Standard Deviation – “s” (lowercase)

Symbol used for Population Standard Deviation – “σ” (sigma, lower case)

Steps to find the Standard Deviation

If x is a number, then the difference “x – mean” is its deviation. The deviations are used to calculate the standard deviation.

Sample Standard Deviation, s = Square root of sample variance
Sample Standard Deviation, s = Square root of [Σ(x − x ¯ )²/ n-1] where x ¯ is average and n is no. of samples

Population Standard Deviation, σ = Square root of population variance
Population Standard Deviation, σ = Square root of [ Σ(x − μ)² / N ] where μ is Mean and N is no.of population.

sd for population descriptive statistics

The standard deviation is always positive or zero. It will be large when the data values are spread out from the mean.

Variance

The variance is a measure of variability. It is the average squared deviation from the mean. The symbol σ² represents the population variance, and the symbol for s²represents sample variance.

Population variance σ²= [ Σ(x − μ)² / N ]
Sample Variance s² = [ Σ(x − x ¯ )²/ n-1 ]

Descriptive Statistics Based on the Shape of the Data

The shape of the data is important because deciding the probability of data is based on its shape. The shape describes the type of the graph.

Measure the shape of the data using three methodologies: symmetric, skewness, and kurtosis.

Symmetric

In the symmetric shape of the graph, the data distributes evenly on both sides. In symmetric data, the mean and median are located close together. This symmetric graph forms a curve called a normal curve.

Skewness

Skewness is the measure of the asymmetry of the distribution of data. The data is not symmetrical (i.e.) it is skewed towards one side. Skewness is classified into two types: positive skew and negative skew.

Positively skewed: In a positively skewed distribution, data values cluster around the left side, while the right side extends longer. The mean and median will be greater than the mode in the positive skew.
Negatively skewed: In a negatively skewed distribution, data values cluster around the right side, while the left side extends longer.The mean and median will be less than the mode.

Kurtosis

Kurtosis is the measure of describing the distribution of data. This data is distributed in three different ways: platykurtic, mesokurtic, and leptokurtic.

Platykurtic: The platykurtic shows a distribution with flat tails. Here, the data is distributed fairly. The flat tails indicated the small outliers in the distribution.

Mesokurtic: In mesokurtic distributions, the data widely distributes. It normally follows a bell-shaped curve and matches the characteristics of a normal distribution.

Leptokurtic: In leptokurtic, the data is very closely distributed. The height of the peak is greater than the width of the peak.

Univariate Data vs. Bivariate Data in Descriptive Statistics

When it comes to delving into the world of data analysis, two key terms you’re likely to encounter are “Univariate” and “Bivariate.” These terms are crucial in descriptive statistics, as they help us categorize and understand the data types we’re working with. Whether you’re deciphering the properties of individual data points or unraveling the intricate dance between two variables, the concepts of univariate and bivariate data provide the foundation for insightful data analysis.

the key difference between univariate and bivariate data lies in the focus of analysis. Univariate analysis centers on understanding the characteristics of a single variable, while bivariate analysis explores connections and interactions between two variables. Let’s break down the differences between univariate and bivariate data to better grasp their significance.

Univariate Data

Univariate data focuses on a single variable, essentially spotlighting one aspect of your data. In this scenario, you want to study the distribution, central tendency, and dispersion of a single set of values. For instance, if you’re analyzing the heights of a group of individuals, you’re dealing with univariate data. Here, the variable of interest is height, and you aim to uncover insights about that specific characteristic.

In univariate analysis, you’re often looking at measures like:

Measures of Central Tendency: Mean, median, and mode provide insights into where the center of the data lies.
Measures of Dispersion: Range, variance, and standard deviation help you understand how spread out the data is.
Frequency Distribution: Creating histograms, bar charts, and pie charts allows you to visualize the data’s distribution.

Bivariate Data

Bivariate data, on the other hand, adds an extra layer of complexity to your analysis by involving two variables. Here, you not only want to understand individual characteristics but also seek to uncover relationships and patterns between two different variables. For example, if you’re examining the relationship between hours of study and exam scores, you’re working with bivariate data. The goal is to determine whether changes in one variable (study hours) have an impact on another (exam scores).

Bivariate analysis often involves techniques such as:

Scatter Plots: These visualizations showcase the relationship between two variables, with each data point plotted on the graph.
Correlation: Calculating correlation coefficients helps you quantify the strength and direction of the relationship between variables.
Regression Analysis: This technique allows you to model the relationship between variables, predicting the outcome of one based on the other.

What are the 10 commonly used descriptive statistics?

Many useful descriptive statistics exist, but here are five of the most commonly used:

Mean: This is the average of all the values in a data set. It’s a good indicator of the overall center of the data, but can be sensitive to outliers, especially in multivariate data with extreme values.
Median: This is the ‘middle’ value when the data is ordered from least to greatest. It’s less affected by outliers than the mean, making it a robust measure for box plot analyses.
Mode: This is the most frequent value in a data set. There can be one mode, or even multiple modes in some cases, especially when dealing with categorical variables.
Standard Deviation: This tells you how spread out the data is from the mean. A larger standard deviation indicates a wider spread of data points. It’s crucial in understanding the dispersion in multivariate data.
Range: This is the difference between the highest and lowest values in the data set. It’s a simple way to gauge how much variation there is but doesn’t tell you anything about the distribution within that range. It’s often represented in graphical representations like box plots.
Categorical Variables: These are variables that represent distinct groups or categories. Analysis often involves graphical representations and contingency tables to understand the relationships between categories.
Contingency Tables: These tables display the frequency distribution of categorical variables. They help in analyzing the relationship between different categorical variables in multivariate data.
Box Plot: A graphical representation that shows the distribution of a dataset through its quartiles. It highlights the median, quartiles, and extreme values, providing a clear picture of the data’s spread and potential outliers.
Graphical Representation: This involves using visual tools like box plots, histograms, and scatter plots to summarize and analyze data, making it easier to identify patterns, trends, and extreme values in both univariate and multivariate datasets.
Extreme Values: These are the data points that are significantly higher or lower than the majority of the data. They can heavily influence the mean and standard deviation, and box plots and other graphical representations often highlight them.

Can Descriptive Statistics be used to make inferences or predictions?

Descriptive statistics do not serve predictions, but they can lay the groundwork for making predictions. Here’s the key difference:

Descriptive statistics summarize the data you have. They use measures like mean, median, and standard deviation to give you a general idea of what the data looks like. This process often involves exploratory data analysis, where open exploration of the data can reveal patterns and insights. For instance, calculating mean scores is a common part of this analysis.

Inferential statistics use the data you have to draw conclusions about a larger population. This allows you to make predictions about things you haven’t observed yet. Here, you would identify the dependent variable and independent variable in your study, which are crucial for making these inferences.

Think of it like this: Descriptive statistics describe your apartment, while inferential statistics use the features of your apartment to guess about the entire apartment building.

So, while descriptive statistics can’t directly predict the future, they help you understand the data and prepare it for inferential statistics, which you can then use for predictions. Summary statistics from your exploratory data analysis can provide the foundation for these predictive models.

Conclusion

In a world flooded with data, understanding, interpreting, and communicating information is paramount. Descriptive statistics doesn’t just crunch numbers; it crafts narratives, constructs visualizations, and empowers us to make informed decisions. Hope this article has given you a brief introduction to descriptive statistics. In this article, we have seen how the various measures of descriptive statistics, such as central tendency, dispersion, and shape of the data curve, help decipher the numbers. We have also bridged the gap between individual characteristics and the dance between variables by learning about univariate and bivariate data.

Also, this article will help you with the standard deviation of these statistics and statisticians. Not only Multivariate analysis measures of spread the sample size of the shape of the distribution of these statistics.

Frequently Asked Questions

Q1. What is descriptive statistics with examples?

Ans. The methods that summarize and describe the main features of a dataset are called descriptive statistics. Measures of central tendencies, measures of variability, etc., which give information about the typical values in a dataset, are all examples of descriptive statistics.

Q2. What are the 5 descriptive statistics?

Ans. The 5 descriptive statistics include standard deviation, minimum and maximum variables, variance, kurtosis, and skewness.

Q3. What are the 3 types of statistics?

Ans. The frequency distribution, central tendency, and variability of a dataset are the 3 main types of descriptive statistics.

Q4. What are the types of descriptive statistics?

Ans. Descriptive statistics are of 3 types: frequency distribution, central tendency, and variability.

The author uses the media shown in this article at their discretion, and Analytics Vidhya does not own it.

Illiyas

I am a Machine Learning professional with a strong background in Natural Language Processing (NLP). I am passionate about predictive modeling, data analysis, and deep learning, as they provide unique opportunities to uncover valuable insights from complex datasets.

Recently, my focus has been on Language Models (LLMs), an exciting area within NLP. I have been actively involved in researching, developing, and refining LLMs to enhance their capabilities and applicability in real-world scenarios. Through my work, I strive to advance the field of NLP and contribute to the development of intelligent systems that can understand and generate human-like language.

Sharing knowledge and collaborating with others is an essential part of my professional journey. I find great joy in exchanging ideas, insights, and expertise with fellow professionals and enthusiasts. By sharing my knowledge, I aim to contribute to the growth of the Machine Learning and NLP community, fostering an environment of continuous learning and innovation.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Descriptive Statistics: Definitions, Types and Examples

Table of contents

What is Descriptive Statistics?

Types of Statistics

What is Inferential Statistics?

Types of Descriptive Statistics

Descriptive Statistics Based on the Central Tendency of Data

Mean

Median

Mode

Descriptive Statistics Based on the Dispersion of Data

Inter Quartile Range (IQR)

Range

Standard Deviation

Variance

Descriptive Statistics Based on the Shape of the Data

Symmetric

Skewness

Kurtosis

Univariate Data vs. Bivariate Data in Descriptive Statistics

Univariate Data

Bivariate Data

What are the 10 commonly used descriptive statistics?

Can Descriptive Statistics be used to make inferences or predictions?

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken