The normal distribution, also known as the Gaussian distribution, is one of the most widely used probability distributions in statistics and machine learning. Understanding its core properties, mean and variance, is important for interpreting data and modelling real-world phenomena. In this article, we will dig into the concepts of mean and variance as they relate to the normal distribution, exploring their significance and how they define the shape and behaviour of this ubiquitous probability distribution.
A normal distribution is a continuous probability distribution characterized by its bell-shaped curve, symmetric around its mean (μ). The equation defining its probability density function (PDF) is:
Where:
The mean (μ) is the central value of the distribution. It indicates the location of the peak and acts as a balance point where the distribution is symmetric.
Key points about the mean:
Example: If a dataset of heights has a normal distribution with μ=170 cm, the average height is 170 cm, and the distribution is symmetric around this value.
Also read: Statistics for Data Science: What is Normal Distribution?
The variance (σ2) quantifies the spread of data around the mean. A smaller variance indicates that the data points are closely clustered around μ, while a larger variance suggests a wider spread.
Key points about variance:
Example: If the heights dataset has σ2=25, the standard deviation (σ) is 5, meaning most heights fall within 170±5 cm.
Also read: Normal Distribution : An Ultimate Guide
Here are the practical applications:
Now let’s see how to calculate the mean, variance, and visualizing the impact of mean and variance using Python:
The mean is calculated by summing up all data points and dividing them by the number of points. Here’s how to do it step-by-step in Python:
data = [4, 8, 6, 5, 9]
total_sum = sum(data)
n = len(data)
mean = total_sum / n
print(f"Mean: {mean}")
Mean: 6.4
Or we can use the built-in function mean in the statistics module to calculate the mean directly
import statistics
# Define the dataset data = [4, 8, 6, 5, 9]
# Calculate the mean using the built-in function
mean = statistics.mean(data)
print(f"Mean: {mean}")
Mean: 6.4
The variance measures the spread of data around the mean. Follow these steps:
deviations = [(x - mean) for x in data]
squared_deviations = [dev**2 for dev in deviations]
sum_squared_deviations = sum(squared_deviations)
variance = sum_squared_deviations / n
print(f"Variance: {variance}")
Variance: 3.44
We can also use the built-in method to calculate the variance in the statistic module.
import statistics
# Define the dataset data = [4, 8, 6, 5, 9]
# Calculate the variance using the built-in function
variance = statistics.variance(data)
print(f"Variance: {variance}")
Variance: 3.44
Now, let’s visualize how changing the mean and variance affects the shape of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
x = np.linspace(-10, 20, 1000)
means = [0, 5, 10] # Different means
constant_variance = 4
constant_std_dev = np.sqrt(constant_variance)
constant_mean = 5
variances = [1, 4, 9] # Different variances
std_devs = [np.sqrt(var) for var in variances]
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
for mu in means:
y = norm.pdf(x, mu, constant_std_dev) # Normal PDF
plt.plot(x, y, label=f"Mean = {mu}, Variance = {constant_variance}")
plt.title("Impact of Changing the Mean (Constant Variance)", fontsize=14)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.legend()
plt.grid()
plt.subplot(1, 2, 2)
for var, std in zip(variances, std_devs):
y = norm.pdf(x, constant_mean, std) # Normal PDF
plt.plot(x, y, label=f"Mean = {constant_mean}, Variance = {var}")
plt.title("Impact of Changing the Variance (Constant Mean)", fontsize=14)
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.legend()
plt.grid()
plt.tight_layout()
plt.show()
Also read: 6 Types of Probability Distribution in Data Science
Impact of Changing the Mean:
Impact of Changing the Variance:
Key points:
The mean (μ) determines the centre of the normal distribution, while the variance (σ2) controls its spread. Adjusting the mean shifts the curve horizontally, whereas changing the variance alters its width and height. Together, they define the shape and behaviour of the distribution, making them essential for analyzing data, building models, and making informed decisions in statistics and machine learning.
Also, if you are looking for an AI/ML course online, then explore: The certified AI & ML BlackBelt Plus Program!
Ans. The mean determines the centre of the distribution. It represents the point of symmetry and the average of the data.
Ans. The mean determines the central location of the distribution, while the variance controls its spread. Adjusting one does not affect the other.
Ans. Changing the mean shifts the curve horizontally along the x-axis but does not alter its shape or spread.
Ans. If the variance is zero, all data points are identical, and the distribution collapses into a single point at the mean.
Ans. Mean, and variance define the shape of the normal distribution and are essential for statistical analysis, predictive modelling, and understanding data variability.
Ans. Higher variance leads to a flatter, wider bell curve, showing more spread-out data, while lower variance results in a taller, narrower curve, indicating tighter clustering around the mean.