What are Mean and Variance of the Normal Distribution?

Janvi Kumari Last Updated : 26 Nov, 2024
6 min read

The normal distribution, also known as the Gaussian distribution, is one of the most widely used probability distributions in statistics and machine learning. Understanding its core properties, mean and variance, is important for interpreting data and modelling real-world phenomena. In this article, we will dig into the concepts of mean and variance as they relate to the normal distribution, exploring their significance and how they define the shape and behaviour of this ubiquitous probability distribution.

Mean and Variance of the Normal Distribution Explained

What is a Normal Distribution?

A normal distribution is a continuous probability distribution characterized by its bell-shaped curve, symmetric around its mean (μ). The equation defining its probability density function (PDF) is:

probability density function (PDF)

Where:

  • μ: the mean (center of the distribution),
  • σ2: the variance (spread of the distribution),
  • σ: the standard deviation (square root of variance).
What is a Normal Distribution?

Mean of the Normal Distribution

The mean (μ) is the central value of the distribution. It indicates the location of the peak and acts as a balance point where the distribution is symmetric.

Key points about the mean:

  1. All values in the distribution are distributed equally around μ.
  2. In real-world data, μ often represents the “average” of a dataset.
  3. For a normal distribution, about 68% of the data lies within one standard deviation (μ±σ).

Example: If a dataset of heights has a normal distribution with μ=170 cm, the average height is 170 cm, and the distribution is symmetric around this value.

Also read: Statistics for Data Science: What is Normal Distribution?

Variance of the Normal Distribution

The variance (σ2) quantifies the spread of data around the mean. A smaller variance indicates that the data points are closely clustered around μ, while a larger variance suggests a wider spread.

variance

Key points about variance:

  1. Variance is the average squared deviation from the mean, where xi​ are individual data points.
  2. The standard deviation (σ) is the square root of the variance, making it easier to interpret in the same units as the data.
  3. Variance controls the “width” of the bell curve. For higher variance:
    • The curve becomes flatter and wider.
    • Data is more dispersed.

Example: If the heights dataset has σ2=25, the standard deviation (σ) is 5, meaning most heights fall within 170±5 cm.

Also read: Normal Distribution : An Ultimate Guide

Relationship Between Mean and Variance

  1. Independent properties: Mean and variance independently influence the shape of the normal distribution. Adjusting μ shifts the curve left or right, while adjusting σ2 changes the spread.
  2. Data insights: Together, these parameters define the overall structure of the distribution and are critical for predictive modelling, hypothesis testing, and decision-making.

Practical Applications

Here are the practical applications:

  1. Data Analysis: Many natural phenomena (e.g., heights, test scores) follow a normal distribution, allowing for straightforward analysis using μ and σ2.
  2. Machine Learning: In algorithms like Gaussian Naive Bayes, the mean and variance play a crucial role in modeling class probabilities.
  3. Standardization: By transforming data to have μ=0 and σ2=1 (z-scores), normal distributions simplify comparative analysis.

Visualizing the Impact of Mean and Variance

  1. Changing the Mean: The peak of the distribution shifts horizontally.
  2. Changing the Variance: The curve widens or narrows. A smaller σ2 results in a taller peak, while a larger σ2 flattens the curve.

Implementation in Python

Now let’s see how to calculate the mean, variance, and visualizing the impact of mean and variance using Python:

1. Calculate the Mean

The mean is calculated by summing up all data points and dividing them by the number of points. Here’s how to do it step-by-step in Python:

Step 1: Define the dataset

data = [4, 8, 6, 5, 9]

Step 2: Calculate the sum of the data

total_sum = sum(data)

Step 3: Count the number of data points

n = len(data)

Step 4: Compute the mean

mean = total_sum / n
print(f"Mean: {mean}")
Mean: 6.4

Or we can use the built-in function mean in the statistics module to calculate the mean directly

import statistics 
# Define the dataset data = [4, 8, 6, 5, 9] 
# Calculate the mean using the built-in function 
mean = statistics.mean(data) 
print(f"Mean: {mean}")
Mean: 6.4

2. Calculate the Variance

The variance measures the spread of data around the mean. Follow these steps:

Step 1: Calculate deviations from the mean

deviations = [(x - mean) for x in data]

Step 2: Square each deviation

squared_deviations = [dev**2 for dev in deviations]

Step 3: Sum the squared deviations

sum_squared_deviations = sum(squared_deviations)

Step 4: Compute the variance

variance = sum_squared_deviations / n
print(f"Variance: {variance}")
Variance: 3.44

We can also use the built-in method to calculate the variance in the statistic module.

import statistics 
# Define the dataset data = [4, 8, 6, 5, 9] 
# Calculate the variance using the built-in function 
variance = statistics.variance(data) 
print(f"Variance: {variance}")
Variance: 3.44

3. Visualize the Impact of Mean and Variance

Now, let’s visualize how changing the mean and variance affects the shape of a normal distribution:

Code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

Step 1: Define a range of x values

x = np.linspace(-10, 20, 1000)

Step 2: Define distributions with different means (mu) but same variance

means = [0, 5, 10]  # Different means
constant_variance = 4
constant_std_dev = np.sqrt(constant_variance)

Step 3: Define distributions with the same mean but different variances

constant_mean = 5
variances = [1, 4, 9]  # Different variances
std_devs = [np.sqrt(var) for var in variances]

Step 4: Plot distributions with varying means

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
for mu in means:
    y = norm.pdf(x, mu, constant_std_dev)  # Normal PDF
    plt.plot(x, y, label=f"Mean = {mu}, Variance = {constant_variance}")
plt.title("Impact of Changing the Mean (Constant Variance)", fontsize=14)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.legend()
plt.grid()

Step 5: Plot distributions with varying variances

plt.subplot(1, 2, 2)
for var, std in zip(variances, std_devs):
    y = norm.pdf(x, constant_mean, std)  # Normal PDF
    plt.plot(x, y, label=f"Mean = {constant_mean}, Variance = {var}")
plt.title("Impact of Changing the Variance (Constant Mean)", fontsize=14)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.legend()
plt.grid()
plt.tight_layout()
plt.show()
Plot

Also read: 6 Types of Probability Distribution in Data Science

Inference from the graph

Impact of Changing the Mean:

  • The mean (μ) determines the central location of the distribution.
  • Observation: As the mean changes:
    • The entire curve shifts horizontally along the x-axis.
    • The overall shape (spread and height) remains unchanged because the variance is constant.
  • Conclusion: The mean affects where the distribution is centered but does not impact the spread or width of the curve.

Impact of Changing the Variance:

  • The variance (σ2) determines the spread or dispersion of the data.
  • Observation: As the variance changes:
    • A larger variance creates a wider and flatter curve, indicating more spread-out data.
    • A smaller variance creates a narrower and taller curve, indicating less spread and more concentration around the mean.
  • Conclusion: Variance affects how much the data is spread around the mean, influencing the width and height of the curve.

Key points:

  • The mean (μ) determines the centre of the normal distribution.
  • The variance (σ2 ) determines its spread.
  • Together, they provide a complete description of the normal distribution’s shape, allowing for precise data modeling.

Common Mistakes When Interpreting Mean and Variance

  1. Misinterpreting Variance: Higher variance doesn’t always indicate worse data; it may reflect natural diversity in the dataset.
  2. Ignoring Outliers: Outliers can distort the mean and inflate the variance.
  3. Assuming Normality: Not all datasets are normally distributed, and applying mean/variance-based models to non-normal data can lead to errors.

Conclusion

The mean (μ) determines the centre of the normal distribution, while the variance (σ2) controls its spread. Adjusting the mean shifts the curve horizontally, whereas changing the variance alters its width and height. Together, they define the shape and behaviour of the distribution, making them essential for analyzing data, building models, and making informed decisions in statistics and machine learning.

Also, if you are looking for an AI/ML course online, then explore: The certified AI & ML BlackBelt Plus Program!

Frequently Asked Questions

Q1. What is the role of the mean (𝜇) in the normal distribution?

Ans. The mean determines the centre of the distribution. It represents the point of symmetry and the average of the data.

Q2. How are mean and variance independent in a normal distribution?

Ans. The mean determines the central location of the distribution, while the variance controls its spread. Adjusting one does not affect the other.

Q3. How does changing the mean affect the distribution?

Ans. Changing the mean shifts the curve horizontally along the x-axis but does not alter its shape or spread.

Q4. What happens if the variance is zero?

Ans. If the variance is zero, all data points are identical, and the distribution collapses into a single point at the mean.

Q5. Why is understanding mean and variance important?

Ans. Mean, and variance define the shape of the normal distribution and are essential for statistical analysis, predictive modelling, and understanding data variability.

Q6. How does variance affect data visualization?

Ans. Higher variance leads to a flatter, wider bell curve, showing more spread-out data, while lower variance results in a taller, narrower curve, indicating tighter clustering around the mean.

Hi, I am Janvi, a passionate data science enthusiast currently working at Analytics Vidhya. My journey into the world of data began with a deep curiosity about how we can extract meaningful insights from complex datasets.

Responses From Readers

Clear

Congratulations, You Did It!
Well Done on Completing Your Learning Journey. Stay curious and keep exploring!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details