An Introductory Note on Linear Regression

Karpuram Last Updated : 08 Feb, 2022

5 min read

This article was published as a part of the Data Science Blogathon.

Introduction

In this article, I will explain linear Regression, one of the machine learning algorithms. After reading this, we will get some basic knowledge about linear Regression, its uses, its types, and so on. Let us start with the table of contents.

What is Linear Regression

Uses of Linear Regression

Selection Criteria

When will Linear Regression be used?

Types of Linear Regression

Understanding Linear Regression

How to find the effectiveness of the model?

R Square method

Regression analysis is a form of predictive modeling technique that investigates the relationship between X and Y, where x is the independent variable Y is the dependent variable.

Types of Regression – There are two types of Regression. One is linear Regression used with continuous variables, and the other is logistic Regression used with categorical variables.

Linear Regression

Regression analysis is graphing a line on a set of data points that most closely fits the overall shape of the data.

In other words, Regression shows the changes in a dependent variable on the y-axis to the changes in the explanatory variable on the x-axis.

Uses of Regression

We determine the strength of predictors, for example, the relation between sales and marketing spending or the connection between age and income.
It is forecasting an effect and is used to predict the impact or impact of changes. This is used to understand how much the dependent variable changes with the evolution of the independent variable. For example, how much sales are increased with extra 1000 rupees spent on marketing?
Trend forecasting. This can be used to get the point estimates.

Selection Criteria

Classification and regression capabilities: Predicts the continuous variable (For example-Temperature of a place)
Data quality: Each missing point removes one data point that could optimize the Regression.
Computational complexity: Linear Regression is not always computationally expensive than the decision tree or the clustering algorithm.
Comprehensible and Transparent: Linear Regression is easily understandable, and a simple mathematical notation can represent transparency.

Where will Linear Regression be used?

Evaluating trends and sales estimates
Analyzing the impact of price changes
Estimation of risk in financial services and insurance domain

Types of Linear Regression

Linear Regression is of two types. One is positive Linear Regression, and the other is negative Linear Regression.

Positive Linear Regression– If the value of the dependent variable increases with the increase of the independent variable, then the slope of the graph is positive; such Regression is said to be Positive Linear Regression.

Source: Author

y=mx+c, where m is the slope of the line. In Positive Linear Regression, the value of m is positive.

Negative Linear Regression- If the value of the dependent variable decreases with the increase in the value of the independent variable, then such Regression is said to be negative linear Regression.

Source: Author

In Negative Linear Regression, the value of m is Negative.

Understanding Linear Regression

First of all, we need to have some data set to design the model.

Let us say the data is as below

x	y
1	3
2	4
3	2
4	4
5	5

The values given are actual values.

Based on the above matters, the graph that most closely fits is as below

y=mx+c, where m is the slope of the line and c is Y-intercept.

From now on x(mean) is referred as x(m) and y(mean) as y(m).

m as per least square method=∑(x-x(m))(y-y(m))/∑(x-x(m))²

As per above data table, x(m)=3, y(m)=3.6.

x	y	x-x(m)	y-y(m)	(x-x(m))²	(y-y(m))²
1	3	-2	-0.6	4	1.2
2	4	-1	0.4	1	-0.4
3	2	0	-1.6	0	0
4	4	1	0.4	1	0.4
5	5	2	1.4	4	2.8

As per the equation of m, its value is m=4/10=0.4,c=2.4, so that the line equation would be y=0.4x+2.4.

x-x(m) is the distance of all the points x through the line y=3.

y-y(m) is the distance of all the points y through the line x=3.6.

Now we will calculate the predicted values of y based on the equation y=mx+c, where m=0.4 and c=2.4.

For x=1,y=0.4*1+2.4=2.8

For x=2,y=0.4*2+2.4=3.2

For x=3,y=0.4*3+2.4=3.6

For x=4,y=0.4*4+2.4=4.0

For x=5,y=0.4*5+2.4=4.4

Now we have actual values and predicted values of y; we need to calculate the distance between them and then reduce them, which means we need to reduce the error, and finally, the line with the minor error would be the line of Regression best fit line.

Finding the best fit line:

For different values of m, we need to calculate the line equation, where y=mx+c as the value of m changes, the equation changes. After every iteration, the predicted value changes according to the line’s equation. It needs to compare with the actual value and the importance of m for which the minimum difference gives the best fit line.

Let’s check the goodness of fit:

To test how good our model is performing, we have a method called the R Square method

R square method

This method is based on a value called the R-Squared value. It measures how close the data is to the regression line—and also known as the coefficient of determination.

Source: Author

To check our model’s good, we need to compare the distance between the actual value and mean versus the distance between the predicted value and mean; here comes the R formula.

R²=∑(y_p-y(m))²/∑(y-y(m))²

If the value of R² is nearer to 1, then the model is more effective

If the value of R² is far away from 1, then the model is least effective

x	y	y-y(m)	(y-y(m))²	y_p	(y_p-y(m))²
1	3	-0.6	0.36	2.8	-0.8
2	4	0.4	0.16	3.2	-0.4
3	2	-1.6	2.56	3.6	0
4	4	0.4	0.16	4.0	0.4
5	5	1.4	1.96	4.4	0.8

R²=1.6/5.2=0.3

This means that the data points are far away from the regression line.

If the value of R is 1, then the actual data points would be on the regression line.

Conclusion

We have covered all the topics related to Linear Regression. And we also found the effectiveness of the model using the R square method. For example, R-value might come close to 1 if the data is regarding a company’s sales. R-value might be too low if the information is from a doctor in psychology since different persons have different characters. So the conclusion is if the R-value is closer to one, the more accurate is the predicted value.

Thanks for reading this article. Learn more here.

Connect with me on https://www.instagram.com/?hl=en.

Image Source: Author.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Karpuram

Hello Everyone,
This is Srivani. I had completed my B.Tech in the computer science department. I am interested in Data Science and programming. Thanks for reading my articles and hope you get knowledge from them.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

brahmaid

It is needed for personalizing the website.

Expiry: Session

Type: HTTP

csrftoken

This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website

Expiry: Session

Type: HTTPS

Identityid

Preserves the login/logout state of users across the whole site.

Expiry: Session

Type: HTTPS

sessionid

Preserves users' states across page requests.

Expiry: Session

Type: HTTPS

g_state

Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.

Expiry: 365 days

Type: HTTP

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

We do not use cookies of this type.

_gcl_au

Used by Google Adsense, to store and track conversions.

Expiry: 3 Months

Type: HTTP