The Measure of Central Tendencies in Statistics -A Beginner’s Guide

Shrish Last Updated : 08 Jul, 2021

5 min read

This article was published as a part of the Data Science Blogathon

Statistics. Whenever I hear this term, I imagine of below

statistics | measure of central tendencies — Image is taken from www. mathnstuff.com

Looks pretty scary to me. In this blog, my aim is to introduce you to the measure of central tendency used in statistics in the easiest possible way. So, let’s get started.

Interesting Sidenote

Before starting with the measure of central tendency, let me share an interesting side note. Do you know these measures of central tendency were used way back during world war 1 and world war 2?.

To be precise, these measures were used in order to access the damage that happened to aircraft during dogfights. In addition to that, those measures were used to also assess the return of aircraft at the rendezvous points. Interesting isn’t it.

aircraft damage — Aircraft damage using world war 1 – Image is taken from www.theatlantic.com

Population and Sample

Before we start with a measure of central tendencies, we need to understand few core concepts. One of them being population and sample.

Population

From a definition viewpoint, the population is a collection of all data points of interest. Let’s get an understanding of this with the help of an example.

Example:- Let’s say we are conducting a survey. Our aim to find out the total number of employees an organization is having. In this case, the total number of employees in the organization is known as population.

Sample

From a definition viewpoint, a sample is a subset of the population. Let’s get an understanding of this with the help of an example.

Example:- Let’s say we are conducting a survey. Our aim to find out the total number of employees on a particular project. In this case, the total number of employees in a project is known as a sample.

Population vs sample | measure of central tendencies — Population Vs Sample – Image is taken from Scribbr.com

What to chose between Population and Sample?

Well in most of the real-life case scenarios, we always deal with sample data. The reason behind this is that a sample is easy to collect and easier to compute than the population. Based on the result that we obtained for a sample, we can then use predictive analytics to make predictions about the entire population.

Parameter and Statistic

Now we have a core understanding of the difference between population and sample. Let’s quickly cover two more important concepts in this aspect – Parameter and Statistic

Parameter

From a definition viewpoint, numbers that are obtained when working with a population are known as a parameter.

Example:- Let’s consider the same example where we are going to count the total number of employees working in an organization. After completion of our survey, we arrive at number – 20000. In this context, 20000 is known as a parameter.

Statistic

From a definition viewpoint, numbers that are obtained when working with a sample is known as a statistic.

Example:- Let’s consider the same example where we are going to count the total number of employees working on a particular project. After completion of our survey, we arrive at number – 20. In this context, 20 is known as a statistic.

That’s why statistics is called statistics !!

Parameter vs statistics — Parameters Vs Statistics – Image is taken from dummies.com

The Measure of Central tendency

Now we have a core understanding of population and sample and their mapping with parameters and statistics. Before starting with our main topic, you might have a query – “What is meant by Central Tendency ?”.

Well, the concept of central tendency is based on the below fact –

“Provided with a larger number of observations of similar type, most of the observations seems to cluster around central position when represented as a graph”.

As we can see in the above graph, most of the observation tends to cluster around the central position and hence the term central tendency.

Now let’s start with the measure of central tendencies.

Mean

This is the very first measure of central tendency. Mean also known as the arithmetic mean is the statistical average of all data points in question.

Example :- Let’s consider first 10 natural numbers – 1,2,3,4,5,6,7,8,9,10

In this case, the mean will be the sum of all those numbers divided by the total number of numbers in questions.

( 1+2+3+4+5+6+7+8+9+10)/10 = 5.5

Advantages of mean:

1.Easiest measure of central tendency

2.Easier to compute

Disadvantages of mean:

1.Heavily affected by the presence of outlier

You might have a question here – “What is an outlier?”

Outlier is basically a data point that is significantly different from the rest of the data points in consideration. So if a dataset is having extremely high or extremely low value, that might be considered as an outlier. There are multiple techniques to detect the presence of outliers like Box plot, five-number summary, etc. but those concepts are out of the scope of this article.

Example:- Let say we have 10 people in a room. Our aim is to compute the average salary of all people in a room. It will be computed by adding the salaries of all people divided by the total number of people. Now imagine Jeff Bezos walked into our room and now if we compute our mean, it is going to be significantly different than previously computed. The reason being the presence of an outlier ( Jeff Bezos in this case since he is having an extremely high salary ).

Median

This is the second measure of central tendency. Median is basically the middlemost data point in the dataset when arranged in ascending or descending order.

In this aspect, there are two variations of the median.

Median with even number of data points

If there are an even number of observations or data points, then the median is simply the average of the middle two numbers.

Example :- Let’s consider first 10 natural numbers – 1,2,3,4,5,6,7,8,9,10

In this case, the median will be the average of the middle two numbers which is going to be 5.5

Median with an odd number of data points

If there are an odd number of observations or data points, then the median is simply the middlemost observation.

Example :- Let’s consider first 11 natural numbers – 1,2,3,4,5,6,7,8,9,10,11

In this case, the median will be the middlemost observation which is going to be 6 in this case.

Advantages of median:

1.Higher resistance to outlier as compared to mean

Disadvantages of median:

1.Data needs to be ordered either in ascending or descending order

Mode

This is the third and last measure of central tendency. Mode is basically the value that appears the most in the dataset.

Example:- Let’s say we have following numbers – 1,2,3,4,4,4,4,4,4,5,6,7,8,9

Here we can clearly see that number 4 is repeated the most number of times and hence going to be mode in this case.

Advantages of mode:

1.Higher resistance to outlier as compared to mean and median

Disadvantages of mode:

1.Difficult to determine if data has more than 1 mode

Hope you like this blog on measures of central tendency.

LinkedIn:

https://www.linkedin.com/in/shrish-mohadarkar-060209109/

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Shrish

My name is Shrish. I have been working as a data scientist at EY. I love technology and during my free time, I try to use my skills to create something awesome in python so that it can be shared on the analyticsvidhya platform

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

The Measure of Central Tendencies in Statistics -A Beginner’s Guide

Interesting Sidenote

Population and Sample

Population

Sample

Parameter and Statistic

Parameter

Statistic

The Measure of Central tendency

Mean

Advantages of mean:

Disadvantages of mean:

Median

Median with even number of data points

Median with an odd number of data points

Advantages of median:

Disadvantages of median:

Mode

Advantages of mode:

Disadvantages of mode:

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID