A Comprehensive Guide to OLS Regression: Part-1

Prashant Sahu Last Updated : 28 Nov, 2024

10 min read

Ordinary Least squares is an optimization technique.OLS is the same technique that the scikit-learn LinearRegression class and the numpy.polyfit() function use behind the scenes. Before we proceed into the details of the OLS technique, it would be worthwhile going through the article I have written on the role of Optimization techniques in machine learning & deep learning. In the same article, I have briefly explained the reason and context for the existence of the OLS technique (Section 6). This article continues the previous one, and I expect readers to be familiar with it.

Ordinary Least Squares (OLS) regression, commonly referred to as OLS, serves as a fundamental statistical method to model the relationship between a dependent variable and one or more independent variables. The OLS model minimizes the sum of the squared differences between observed and predicted values, ensuring the best fit for the data. OLS linear regression finds wide application in various fields, including economics and social sciences, providing valuable insights into data patterns and helping researchers make informed decisions based on their analyses. So here We have given What you learn in this article!

Learning Objectives:

Learn what OLS is and understand its mathematical equation
Get an overview of OLS in scaler form and its drawbacks
Understand OLS using a real-time example

What is the OLS Regression Model?
What are Optimization Problems?
Why Do We Need OLS?
OLS Solution in Scaler Form
OLS in Action Using an Actual Example
Problems with the Scaler Form of OLS Solution
Conclusion
- Key Takeaway

What is the OLS Regression Model?

OLS regression is a statistical method utilized for parameter estimation in linear regression models. Ordinary least squares (OLS) aim to find the optimal line that minimizes the total squared differences between the actual and estimated values of the dependent variable.

The key components of OLS Linear Regression are:.

It demonstrates the linear relationship between a response variable (y) and one or more predictor variables (x).
The linear equation is y = β0 + β1×1 + β2×2 + … + βpxp + ε, where β0 is the intercept, β1 to βp are the coefficients for x1 to xp, and ε is the error term.
OLS chooses β0, β1, …, βp to minimize the sum of squared differences between the observed y values and the predicted y values from the regression line.
If the OLS estimators meet certain conditions like linearity, lack of multicollinearity, homoscedasticity, absence of autocorrelation, and normality of errors, they will be unbiased, consistent, and have the lowest variance among linear unbiased estimators.

What are Optimization Problems?

Optimization problems are mathematical problems that involve finding the best solution from a set of possible solutions. These problems typically present themselves as maximization or minimization problems, aiming to maximize or minimize a certain objective function. The objective function serves as a mathematical expression that describes the quantity to optimize, while a set of constraints defines the possible solutions.

Optimization problems arise in various fields, including engineering, finance, economics, and operations research. They are used to model and solve problems such as resource allocation, scheduling, and portfolio optimization. Optimization is a crucial component of many machine learning algorithms. In machine learning, optimization helps find the best set of parameters for a model that minimizes the difference between the model’s predictions and the true values. Researchers actively explore optimization as a key area in machine learning, developing new algorithms to improve the speed and accuracy of training models.

Examples

Some examples of where optimization is used in machine learning include:

In supervised learning, optimization is used to find the parameters of a model that minimize the difference between the model’s predictions and the true values for a given training dataset. For example, linear regression and logistic regression use optimization to find the best values of the model’s coefficients. In addition, some models, like decision trees, random forests, and gradient boosting models, build by iteratively adding new models to the ensemble and optimizing the parameters of the new models to minimize the error on the training data.
In unsupervised learning, optimization helps to find the best configuration of clusters or mapping of the data that best represents the underlying structure in the data. In clustering, optimization is used to find the best cluster configuration in the data. For example, the K-Means algorithm uses an optimization technique called Lloyd’s algorithm, which iteratively reassigns data points to the nearest cluster centroid and updates the cluster centroids based on the newly assigned points. Similarly, other clustering algorithms, such as hierarchical, density-based, and Gaussian mixture models, also use optimization techniques to find the best clustering solution. In dimensionality reduction, optimization finds the best data mapping from a high- to a lower-dimensional space. For example, Principal Component Analysis (PCA) uses Singular Value Decomposition (SVD), an optimization technique, to find the best linear combination of the original variables that explains the most variance in the data. Other dimensionality reduction techniques like Linear Discriminant Analysis (LDA) and t-distributed Stochastic Neighbor Embedding (t-SNE) also use optimization techniques to find the best data representation in a lower-dimensional space.
In deep learning, optimization helps find the best set of parameters for neural networks, typically using gradient-based optimization algorithms such as stochastic gradient descent (SGD) or Adam/Adagrad/RMSProp.

Why Do We Need OLS?

The ordinary least squares (OLS) algorithm is a method for estimating the parameters of a linear regression model. The OLS algorithm aims to find the values of the linear regression model’s parameters (i.e., the coefficients) that minimize the sum of the squared residuals. The residuals are the differences between the observed values of the dependent variable and the predicted values of the dependent variable given the independent variables. It is important to note that the OLS algorithm assumes the errors follow a normal distribution with zero mean and constant variance, and it assumes no multicollinearity (high correlation) among the independent variables. Use other methods, such as Generalized Least Squares or Weighted Least Squares, when these assumptions are unmet.

Understanding the Mathematics behind the OLS Algorithm

To explain the OLS algorithm, let me take the simplest possible example. Consider the following 3 data points:

Everyone familiar with machine learning will immediately recognize that we are referring to X1 as the independent variable (also called “Features” or “Attributes”), and the Y is the dependent variable (also referred to as the “Target” or “Outcome”). Hence, the overall task of any machine is to find the relationship between X1 & Y. This relationship is actually “learned” by the machine from the DATA. Hence, we call the term Machine Learning. We, humans, learn from our experiences. Similarly, the same experience is fed into the machine as data.

Finding Best-Fit Line

We want to find the best-fit line through the above 3 data points. The following plot shows these 3 data points in blue circles. Also shown is the red line (with squares), which we claim is the “best-fit line” through these 3 data points. I have also shown a “poor-fitting” line (the yellow line) for comparison.

The net objective is to find the Equation of the Best-Fitting Straight Line (through these 3 data points mentioned in the above table).

It is the equation of the best-fit line (red line in the above plot), where w1 = slope of the line; w0 = intercept of the line. In machine learning, this best fit is called the Linear Regression (LR) model, and w0 and w1 are also called model weights or model coefficients.

Red-squares in the above plot represent the predicted values from the Linear Regression model (Y^). Of course, the predicted values are NOT the same as the actual values of Y (blue circles). The vertical difference represents the error in the prediction given by (see the image below) for any ith data point.

Now, I claim that this best-fit line will have the minimum error for prediction (among all possible infinite random “poor-fit” lines). This total error across all the data points is expressed as the Mean Squared Error (MSE) Function, which will be the minimum for the best-fit line.

N = Total no. of data points in the dataset (in the current case, it is 3)
Minimizing or maximizing any quantity mathematically refers to an Optimization Problem, and the solution (the point where the minimum or maximum exists) indicates the optimal values of the variables.

Linear Regression

Linear Regression is an example of unconstrained optimization, given by:

———– (4)

This is read as “Find the optimal weights (w_j) for which the MSE Loss function (given in eq. 3 above) has min value, for a GIVEN X, Y data” (refer to very first table at the start of the article). L(w_j) represents the MSE Loss, a function of the model weights, not X or Y. Remember, X & Y is your DATA and is supposed to be CONSTANT! The subscript “j” represents the jth model coefficient/weight.

Upon substituting for Y^ = w₀ + w₁X₁in the eq. 3 above, the final MSE Loss Function (L) looks as:

———– (5)

Clearly, L is a function of model weights (w₀ & w₁), whose optimal values we have to find upon minimizing L. The optimal values are represented by (*) in the figure below.

OLS Solution in Scaler Form

The eq. 5 given above represents the OLS Loss function in the scaler form (where we can see the summation of errors for each data point. The OLS algorithm is an analytical solution to the optimization problem presented in the eq. 4. This analytical solution consists of the following steps:

Step 1:

Step 2: Equate these gradients to zero and solve for the optimal values of the model coefficients w_j.

This basically means that the slope of the tangent (the geometrical interpretation of the gradients) to the Loss function at the optimal values (the point where L is minimum) will be zero, as shown in the figures above.

From the above equations, we can shift the “2” from the LHS to the RHS; the RHS remains as 0 (as 0/2 is still 0).

These expressions for w1* and w0* are the final OLS Analytical solution in the Scaler form.

Step 3: Compute the above means and substitute in the expression for w1* & w0*.

Let’s calculate these values for our dataset:

Let us calculate the same using Python code:


[OUTPUT]: This is the Equation of the "Best-fit" Line:
2.675 x + 2.875

You can see our “hand-calculated” values match very closely the values of slope and intercept obtained using NumPy (the small difference is due to round-off errors in our hand-calculations). We can also verify that the same OLS is “running behind the scenes” of the LinearRegression class from the scikit-learn package, as demonstrated in the code below.

# import the LinearRegression class from scikit-learn package
from sklearn.linear_model import LinearRegression

LR = LinearRegression() # create an instance of the LinearRegression class

# define your X and Y as NumPy Arrays (column vectors)
X = np.array([1,3,5]).reshape(-1,1)
Y = np.array([4.8,12.4,15.5]).reshape(-1,1)

LR.fit(X,Y) # calculate the model coefficients
LR.intercept_ # the bias or the intercept term (w0*)

[Output]: array([2.875])

LR.coef_ # the slope term (w1*)
[Output]: array([[2.675]])

OLS in Action Using an Actual Example

Here I am using the Boston House Pricing dataset, one of the most commonly encountered datasets while learning Data Science. The objective is to make a Linear Regression Model to Predict the median value of the House prices based on 13 features/attributes mentioned below.

Import and explore the dataset.

We’ll extract a single feature RM, the average room size in the given locality, and fit it with the target variable y (the median value of the house price).

Now, let’s use pure NumPy and calculate the model coefficients using the expressions derived for the optimal values of the model coefficients w0 & w1 above (end of Step 2).

Let us finally plot the original data along with the best-fit line, as given below.

Problems with the Scaler Form of OLS Solution

Finally, let me discuss the main problem with the above approach, as described in section 4. As you can see from the abovementioned dataset, any real-life dataset will have multiple features. I took only one feature to demonstrate the OLS method in the above section because increasing the number of features also increases the number of gradients and the equations to solve simultaneously.

For 13 features (Boston dataset above), we’ll have 13 model coefficients and one intercept term, which brings the total number of variables to be optimized to 14. Hence, we’ll obtain 14 gradients (the partial derivative of the loss function concerning each of these 14 variables). Consequently, we need to solve 14 equations (after equating these 14 partial derivatives to zero, as described in step 2). You have already realized the complexity of the analytical solution with just 2 variables. Frankly, I have tried to give you the MOST elaborate explanation of OLS available on the internet, and yet it is not easy to assimilate the mathematics.

Hence, in simple words, the above analytical solution is NOT SCALABLE!

The solution to this problem is the “Vectorized Form of the OLS Solution,” which I will discuss in detail in a follow-up article (Part 2 of this article), covering sections 7 & 8.

Conclusion

In conclusion, the OLS method is a powerful tool for estimating the parameters of a linear regression model. It is based on minimizing the sum of squared differences between the predicted and actual values.

I hope you enjoy the article and gain a clear understanding of the Ordinary Least Squares (OLS) regression, commonly referred to as the OLS model. This statistical technique estimates the relationships among variables. OLS linear regression minimizes the sum of squared differences, providing a robust method for predictive analysis in various fields.

Key Takeaway

Some of the key takeaways from the article are as follows:

The OLS solution represents itself in scalar form, making implementation and interpretation easy.
The article discussed optimization problems and the need for OLS in regression analysis and provided a mathematical formulation and an example of OLS in action.
The article also highlights some of the limitations of the scaler form of the OLS solution, such as scalability and the assumptions of linearity and constant variance. I hope you learned something new from this article.

Prashant Sahu

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

A Comprehensive Guide to OLS Regression: Part-1

Table of contents

What is the OLS Regression Model?

What are Optimization Problems?

Examples

Why Do We Need OLS?

Understanding the Mathematics behind the OLS Algorithm

Finding Best-Fit Line

Linear Regression

OLS Solution in Scaler Form

OLS in Action Using an Actual Example

Problems with the Scaler Form of OLS Solution

Conclusion

Key Takeaway

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp