What are Mean and Variance of the Normal Distribution?

Janvi Kumari Last Updated : 26 Nov, 2024

6 min read

The normal distribution, also known as the Gaussian distribution, is one of the most widely used probability distributions in statistics and machine learning. Understanding its core properties, mean and variance, is important for interpreting data and modelling real-world phenomena. In this article, we will dig into the concepts of mean and variance as they relate to the normal distribution, exploring their significance and how they define the shape and behaviour of this ubiquitous probability distribution.

Mean and Variance of the Normal Distribution Explained

What is a Normal Distribution?

A normal distribution is a continuous probability distribution characterized by its bell-shaped curve, symmetric around its mean (μ). The equation defining its probability density function (PDF) is:

Where:

New Feature

Get Personalized Learning Path! Set your goal and timeline. Get a path—under 2 mins.

μ: the mean (center of the distribution),
σ2: the variance (spread of the distribution),
σ: the standard deviation (square root of variance).

Mean of the Normal Distribution

The mean (μ) is the central value of the distribution. It indicates the location of the peak and acts as a balance point where the distribution is symmetric.

Key points about the mean:

All values in the distribution are distributed equally around μ.
In real-world data, μ often represents the “average” of a dataset.
For a normal distribution, about 68% of the data lies within one standard deviation (μ±σ).

Example: If a dataset of heights has a normal distribution with μ=170 cm, the average height is 170 cm, and the distribution is symmetric around this value.

Also read: Statistics for Data Science: What is Normal Distribution?

Variance of the Normal Distribution

The variance (σ2) quantifies the spread of data around the mean. A smaller variance indicates that the data points are closely clustered around μ, while a larger variance suggests a wider spread.

Key points about variance:

Variance is the average squared deviation from the mean, where xi are individual data points.
The standard deviation (σ) is the square root of the variance, making it easier to interpret in the same units as the data.
Variance controls the “width” of the bell curve. For higher variance:
- The curve becomes flatter and wider.
- Data is more dispersed.

Example: If the heights dataset has σ2=25, the standard deviation (σ) is 5, meaning most heights fall within 170±5 cm.

Also read: Normal Distribution : An Ultimate Guide

Relationship Between Mean and Variance

Independent properties: Mean and variance independently influence the shape of the normal distribution. Adjusting μ shifts the curve left or right, while adjusting σ2 changes the spread.
Data insights: Together, these parameters define the overall structure of the distribution and are critical for predictive modelling, hypothesis testing, and decision-making.

Practical Applications

Here are the practical applications:

Data Analysis: Many natural phenomena (e.g., heights, test scores) follow a normal distribution, allowing for straightforward analysis using μ and σ2.
Machine Learning: In algorithms like Gaussian Naive Bayes, the mean and variance play a crucial role in modeling class probabilities.
Standardization: By transforming data to have μ=0 and σ2=1 (z-scores), normal distributions simplify comparative analysis.

Visualizing the Impact of Mean and Variance

Changing the Mean: The peak of the distribution shifts horizontally.
Changing the Variance: The curve widens or narrows. A smaller σ2 results in a taller peak, while a larger σ2 flattens the curve.

Implementation in Python

Now let’s see how to calculate the mean, variance, and visualizing the impact of mean and variance using Python:

1. Calculate the Mean

The mean is calculated by summing up all data points and dividing them by the number of points. Here’s how to do it step-by-step in Python:

Step 1: Define the dataset

data = [4, 8, 6, 5, 9]

Step 2: Calculate the sum of the data

total_sum = sum(data)

Step 3: Count the number of data points

n = len(data)

Step 4: Compute the mean

mean = total_sum / n
print(f"Mean: {mean}")

Mean: 6.4

Or we can use the built-in function mean in the statistics module to calculate the mean directly

import statistics 
# Define the dataset data = [4, 8, 6, 5, 9] 
# Calculate the mean using the built-in function 
mean = statistics.mean(data) 
print(f"Mean: {mean}")

Mean: 6.4

2. Calculate the Variance

The variance measures the spread of data around the mean. Follow these steps:

Step 1: Calculate deviations from the mean

deviations = [(x - mean) for x in data]

Step 2: Square each deviation

squared_deviations = [dev**2 for dev in deviations]

Step 3: Sum the squared deviations

sum_squared_deviations = sum(squared_deviations)

Step 4: Compute the variance

variance = sum_squared_deviations / n
print(f"Variance: {variance}")

Variance: 3.44

We can also use the built-in method to calculate the variance in the statistic module.

import statistics 
# Define the dataset data = [4, 8, 6, 5, 9] 
# Calculate the variance using the built-in function 
variance = statistics.variance(data) 
print(f"Variance: {variance}")

Variance: 3.44

3. Visualize the Impact of Mean and Variance

Now, let’s visualize how changing the mean and variance affects the shape of a normal distribution:

Code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

Step 1: Define a range of x values

x = np.linspace(-10, 20, 1000)

Step 2: Define distributions with different means (mu) but same variance

means = [0, 5, 10]  # Different means
constant_variance = 4
constant_std_dev = np.sqrt(constant_variance)

Step 3: Define distributions with the same mean but different variances

constant_mean = 5
variances = [1, 4, 9]  # Different variances
std_devs = [np.sqrt(var) for var in variances]

Step 4: Plot distributions with varying means

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
for mu in means:
    y = norm.pdf(x, mu, constant_std_dev)  # Normal PDF
    plt.plot(x, y, label=f"Mean = {mu}, Variance = {constant_variance}")
plt.title("Impact of Changing the Mean (Constant Variance)", fontsize=14)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.legend()
plt.grid()

Step 5: Plot distributions with varying variances

plt.subplot(1, 2, 2)
for var, std in zip(variances, std_devs):
    y = norm.pdf(x, constant_mean, std)  # Normal PDF
    plt.plot(x, y, label=f"Mean = {constant_mean}, Variance = {var}")
plt.title("Impact of Changing the Variance (Constant Mean)", fontsize=14)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.legend()
plt.grid()
plt.tight_layout()
plt.show()

Also read: 6 Types of Probability Distribution in Data Science

Inference from the graph

Impact of Changing the Mean:

The mean (μ) determines the central location of the distribution.
Observation: As the mean changes:
- The entire curve shifts horizontally along the x-axis.
- The overall shape (spread and height) remains unchanged because the variance is constant.
Conclusion: The mean affects where the distribution is centered but does not impact the spread or width of the curve.

Impact of Changing the Variance:

The variance (σ2) determines the spread or dispersion of the data.
Observation: As the variance changes:
- A larger variance creates a wider and flatter curve, indicating more spread-out data.
- A smaller variance creates a narrower and taller curve, indicating less spread and more concentration around the mean.
Conclusion: Variance affects how much the data is spread around the mean, influencing the width and height of the curve.

Key points:

The mean (μ) determines the centre of the normal distribution.
The variance (σ2 ) determines its spread.
Together, they provide a complete description of the normal distribution’s shape, allowing for precise data modeling.

Common Mistakes When Interpreting Mean and Variance

Misinterpreting Variance: Higher variance doesn’t always indicate worse data; it may reflect natural diversity in the dataset.
Ignoring Outliers: Outliers can distort the mean and inflate the variance.
Assuming Normality: Not all datasets are normally distributed, and applying mean/variance-based models to non-normal data can lead to errors.

Conclusion

The mean (μ) determines the centre of the normal distribution, while the variance (σ2) controls its spread. Adjusting the mean shifts the curve horizontally, whereas changing the variance alters its width and height. Together, they define the shape and behaviour of the distribution, making them essential for analyzing data, building models, and making informed decisions in statistics and machine learning.

Also, if you are looking for an AI/ML course online, then explore: The certified AI & ML BlackBelt Plus Program!

Frequently Asked Questions

Q1. What is the role of the mean (𝜇) in the normal distribution?

Ans. The mean determines the centre of the distribution. It represents the point of symmetry and the average of the data.

Q2. How are mean and variance independent in a normal distribution?

Ans. The mean determines the central location of the distribution, while the variance controls its spread. Adjusting one does not affect the other.

Q3. How does changing the mean affect the distribution?

Ans. Changing the mean shifts the curve horizontally along the x-axis but does not alter its shape or spread.

Q4. What happens if the variance is zero?

Ans. If the variance is zero, all data points are identical, and the distribution collapses into a single point at the mean.

Q5. Why is understanding mean and variance important?

Ans. Mean, and variance define the shape of the normal distribution and are essential for statistical analysis, predictive modelling, and understanding data variability.

Q6. How does variance affect data visualization?

Ans. Higher variance leads to a flatter, wider bell curve, showing more spread-out data, while lower variance results in a taller, narrower curve, indicating tighter clustering around the mean.

Janvi Kumari

Hi, I am Janvi, a passionate data science enthusiast currently working at Analytics Vidhya. My journey into the world of data began with a deep curiosity about how we can extract meaningful insights from complex datasets.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

What are Mean and Variance of the Normal Distribution?

What is a Normal Distribution?

Get Personalized Learning Path! Set your goal and timeline. Get a path—under 2 mins.

Mean of the Normal Distribution

Variance of the Normal Distribution

Relationship Between Mean and Variance

Practical Applications

Visualizing the Impact of Mean and Variance

Implementation in Python

1. Calculate the Mean

Step 1: Define the dataset

Step 2: Calculate the sum of the data

Step 3: Count the number of data points

Step 4: Compute the mean

2. Calculate the Variance

Step 1: Calculate deviations from the mean

Step 2: Square each deviation

Step 3: Sum the squared deviations

Step 4: Compute the variance

3. Visualize the Impact of Mean and Variance

Code:

Step 1: Define a range of x values

Step 2: Define distributions with different means (mu) but same variance

Step 3: Define distributions with the same mean but different variances

Step 4: Plot distributions with varying means

Step 5: Plot distributions with varying variances

Inference from the graph

Common Mistakes When Interpreting Mean and Variance

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers