Estimators – An Introduction to Beginners in Data Science

Naman Last Updated : 01 Jul, 2021

6 min read

This article was published as a part of the Data Science Blogathon.

Introduction

Not having much information about the distribution of a random variable can become a major problem for data scientists and statisticians. Consider, a researcher trying to understand the distribution of Choco-chips in a cookie (a very popular example of Poisson distribution). The researcher is well aware that the distribution of Choco-chips follows a Poisson distribution, but does not know how to estimate the parameter λ of the distribution.

A parameter is essentially a numerical characteristic of a distribution (or any statistical model in general). Normal distributions have µ & σ as parameters, uniform distributions have a & b as parameters, and binomial distributions have n & p as parameters. These numerical characteristics are vital for understanding the size, shape, spread, and other properties of a distribution. In the absence of the true value of the parameter, it seems that the researcher may not be able to continue her investigation. But that’s when estimators step in.

Estimators are functions of random variables that can help us find approximate values for these parameters. Think of these estimators like any other function, that takes an input, processes it, and renders an output. So, the process of estimation goes as follows:

1) From the distribution, we take a series of random samples.

2) We input these random samples into the estimator function.

3) The estimator function processes it and gives a set of outputs.

4) The expected value of that set is the approximate value of the parameter.

Example

Let’s take an example. Consider a random variable X showing a uniform distribution. The distribution of X can be represented as U[0, θ]. This has been plotted below:

estimators 1

(Figure A)

We have the random variable X and its distribution. But we don’t know how to determine the value of θ. Let’s use estimators. There are many ways to approach this problem. I’ll discuss two of them:

1) Using Sample Mean

We know that for a U[a, b] distribution, the mean µ is given by the following equation:

For U[0, θ] distribution, a = 0 & b = θ, we get:

estimators 2

Thus, if we estimate µ, we can estimate θ. To estimate µ, we use a very popular estimator called the sample mean estimator. The sample mean is the sum of the random sample value drawn divided by the size of the sample. For instance, if we have a random sample S = {4, 7, 3, 2}, then the sample mean is (4+7+3+2)/4 = 4 (the average value). In general, the sample mean is defined using the following notation:

estimators 3

Here, µ-hat is the sample mean estimator & n is the size of the random sample that we take from the distribution. A variable with a hat on top of it is the general notation for an estimator. Since our unknown parameter θ is twice of µ, we arrive at the following estimator for θ:

estimators 4

We take a random sample, plug it into the above estimator, and get a number. We repeat this process and get a set of numbers. The following figure illustrates the process:

(Figure B)

The lines on the x-axes correspond to the values present in the sample taken from the distribution. The red lines in the middle indicate the average value of the sample, and the red lines at the end are twice that average value i.e., the expected value of θ for one sample. Many such samples are taken, and the estimated value of θ for each sample is noted. The expected value/mean of that set of numbers gives the final estimate for θ. It can be mathematically proved (using properties of expectation):

estimators 5

It is seen that the expectation of the estimator is equal to the true value of the parameter. This amazing property that certain estimators have is called unbiasedness, which is a very useful criterion for assessing estimators.

2) Maximum Value Method

This time, instead of using mean, we’ll use order statistics, particularly the nth order statistic. The nth order statistic is defined as the nth smallest value of a random sample of size n. In other words, it’s the maximum value of a random sample. For instance, if we have a random sample S = {4, 7, 3, 2}, then the nth order statistic is 7 (the largest value). The estimator is now defined as follows:

We follow the same procedure- take random samples, input them, collect the output and find the expectation. The following figure illustrates the process:

(Figure C)

As noted previously, the lines on the x-axes are the values present in one sample. The red lines at the end are the maximum value for that sample i.e., the nth order statistic. Two random samples are shown for reference. However, we need to take much larger samples. Why? To prove it, we’ll use the general expression for the PDF (Probability Distribution Function) of nth order statistics for U[a, b] distribution:

estimators pdf

For U[0, θ] distribution, a = 0 & b = θ, we get:

Using the integral form of expectation of a continuous variable,

Unlike before, no exact equality has been ascertained between the expectation of θ-hat & θ. This is because the nth order statistic estimator is biased. We can observe this bias on comparing figures B & C. In figure B, both positive (θ-hat > θ) and negative (θ-hat < θ) deviations are possible with equal probability (as shown). These deviations can cancel each other out, making the sample mean estimator unbiased. However, in figure C, only negative deviations are possible. Why? Because the maximum value of a random sample will always be lesser than the range [0, θ] of the distribution. Consequently, negative bias steps in.

Does that mean that we cannot use this estimator? Certainly not. As discussed earlier, the estimator bias can be significantly lowered by taking large n. For large values of n, n = n+1 (approximately). Thus, we get:

The Bottom Line

Hence, we have successfully solved our problem through estimators. We also learned a very important property of estimators- unbiasedness. While this may have been an extensive read, it’s imperative to acknowledge that the study of estimators is not restricted to just the above-explained concepts. Various other properties of estimators such as their efficiency, robustness, mean squared error, and consistency are also vital to deepen our understanding of them.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Naman

I am currently a first-year undergraduate student at the National University of Singapore (NUS), who is deeply interested in Statistics, Data Science, Economics and Machine Learning. I love working on different Data Science projects to enhance my analytical and inferential skills. I find it important to share my learning with other members of the community by simplifying data science is such a way that young minds can understand and local leaders can implement.

Beginner Probability Statistics

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Estimators – An Introduction to Beginners in Data Science

Introduction

Example

1) Using Sample Mean

2) Maximum Value Method

The Bottom Line

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie