Geometrical Approach To Understand Logistic Regression

Rajdeep Last Updated : 23 Jul, 2021

7 min read

This article was published as a part of the Data Science Blogathon

Introduction

Logistic Regression is another statistical model which is used for binary classification. It’s named “Regression” because the underlying technology is similar to “Linear Regression”.

Here in this article, we will discuss:

Understanding the Basics(Logistic Regression).
Formulating the equation(finding better Hyperplane).
The solution to the Outlier Problem(Sigmoid).
Usage of Monotonic Function(log).
Need of Regularization(Lambda).
Getting started with the Code(Logistic Regression vs SGD with log loss).

Understanding the Basics

Let’s say we have a problem with spam emails and we want to keep the Non-spam(Ham) in the Inbox and send the Spam to the Spam folder

spam and non spam mail | Logistic regression

Fig 1:Spam and Non-Spam email

We have two types of email Spam and Ham, let’s look at the different features of spam email. The spam email contains

Unfriendly domain names
The number of appearances of words “loan”,” amount”
Spelling mistakes

There will be lots of other features but for now, let’s stick with these.

If we plot the number of times the word “loan” appears on the email on the x-axis and the spelling mistakes on the y-axis we get a pretty decent graph where the Spam and Ham can be separated using a line. And this is what logistic regression is. Basically, we want data that is linearly separable into two classes.

e.g. Here we want to separate Ham and Spam, for this, we need good features which help us in the classification.

plot of two features | Logistic regression

Fig 2:Plot of two features

Formulating the equation

Now if the data(i.e. all emails) is linearly separable then a line can separate the data points(each email) into two classes(Spam and Ham). For humans, it’s easy to plot the line in the 2D chart, but for computers, we must follow sets of instructions.

So let’s consider the Ham as positive class “+1”, Spam as negative class “-1” and we have “n” number of features.

In 2-D the equation of the line is : w₁x₁+w₂x₂+w₀=0

In 3-D the equation of the plane is : w₁x₁+w₂x₂+w₃x₃+w₀=0

In n-D the equation of the hyperplane is => w₁x₁+w₂x₂+w₃x₃+……….w_nx_n+w₀=0

=> [w₁,w₂,w₃……….w_n]^T[x₁,x₂,x₃……….x_n]+w₀=0

=> w_i^Tx_i+w₀=0

If the hyperplane passes through the origin then w0 becomes 0 then the equation becomes w_i^Tx_i=0. Now our task is to find the hyperplane that separates the data points.

Let’s construct a hyperplane and consider a normal unit vector “w” consisting of randomly selected values(w_i) for “w”. Now we want to find out how many data points have been correctly classified by this hyperplane.

For this we will take the help of the “equation of a point from the hyperplane” i.e d_i= w^Tx_i/||w|| since we are considering w as normal unit vector(||w|| =1), therefore d_i= w^Tx_i

Now if the data points are in the same direction of the normal vector then it will belong to the positive class else it belongs to the negative class.

Let’s consider

A positive class data point xi which is in the direction of the normal vector, then wTxi will be greater than 0 and if we multiply it with the class label y_i=+1 then y_iw^Tx_i>0
A negative class data point x_j which is in the opposite direction of the normal vector then w^Tx_j will be less than 0, since y_j=-1, therefore, y_jw^Tx_j>0
A positive class data point x_i which is in the opposite direction of the normal vector, then w^Tx_i will be less than 0 and the class label y_i=+1 then y_iw^Tx_i<0 i.e data point has been incorrectly classified.

Hyperplane seperates the data

Fig 3:How hyperplane separates the data points

From the above hyperplane h₁ if we sum all the data points

Let y_iw^Tx_i=1 for correctly classified and -1 for incorrectly classified

sum(h₁) = 1+1+1+1+(-1)+(-1)

= 4

Suppose if we have another hyperplane h₂ where the sum(h₂) is equal to 5 then h₂ is a better hyperplane than h₁

So, we want the best hyperplane which minimizes the number of misclassifications and maximizes the sum.

The solution to the Outlier Problem

Now, if we have an outlier on either side of the hyperplane then it will impact the sum. Hence it will impact the selection of the best hyperplane.

For example:

Fig 4:Outlier affecting the hyperplane

Even Though the hyperplane separates the data points correctly, the summation gets impacted by the outlier.

Since w^Tx_outlier >> w^Tx_{correctly_classifier}therefore, y_iw^Tx_outlier will reduce the overall sum for that hyperplane.

To deal with the outliers which impact the values of w, we will be using the sigmoid function.

Sigmoid(x)=1/(1+e^-x)

Lets us assume we have an outlier then w^Tx_outlier >> w^Tx_{correctly_classifier}

Now if we use the sigmoid function then sigmoid(w^Tx_outlier) << w^Tx_outlier

When the value of w^Tx_outlier becomes larger it tapers it off, the values of w^Tx are squashed from [-infinity,+infinity] to [0,1].

Fig 5:Sigmoid function and how it helps to reduce the value of outliers

Monotonic Function

We know that a monotonic function increases with increasing the value of x and if a function is monotonic then when applied a monotonic function retains the same maxima or minima i.e

x increases then f_monotonic(x) increases (i)
Argmin f_monotonic(x) = Argmin g_monotonic(f_monotonic(x)) (ii)
Also we know that Argmax – f(x) = Argmin f(x) (iii)
Log is also a monotonic function and log(1/x) = -log(x) (iv)

Our logistic regression equation: w*= argmax(summation_i=0_to_n(1/1+e^-yiwTxi))

Need of Regularization

We have learned how these functions are being used to understand the data and how to overcome various issues. Now let us understand one most important topic known as Bias and Variance Tradeoff.

Suppose we have a dataset contain emails. Now we have got an email for classification which contains the word “loan” multiple times. Our model directly classified it as “Spam” because it contains the word “loan”. Optimizing the logistic regression loss function, our model has learned that any email contains the word “loan” multiple times is “spam”. This is called overfitting i.e. high variance. We are trying to learn everything from our training data which causes our model to overfit.

Now using regularization significantly reduces the variance without increasing the bias. Regularization refers to the act of modifying a learning algorithm to favour “simpler” prediction rules to avoid overfitting. It modifies the loss function to penalize certain values of the weights.

Let z_i = y_iw^Tx_i

=> z_i-> + infinity, then exp(-z_i) -> 0,

Log(1+exp(-z_i)) -> 0

w^* = 0

If the selected “w” classifies all the training points correctly and if z_itends to “- infinity” then the selected “w” is the best “w” on training data. This is a condition of overfitting as the training data may contain outliers to which our model has been perfectly fitted. So, we will add the regularization to deal with the problem.

Here “lambda” is a hyperparameter.

If lambda = 0 then the loss term optimization results in an overfitting model i.e. high variance
If lambda is larger then the loss term diminished and this leads to an underfitting model i.e. high bias

Fig 6:Logistic Loss vs Regularization

The regularization term will constrain w from reaching “+infinity” and “-infinity”.

We can also use L1 regularization which induces sparsity(most of the elements of the weight vector are zero) into the weight vector. The less important features vanish in Logistic Regression with L1 Regularization while using L2 the weight for the less important features becomes small but remains non-zero.

Getting Started with the Code

Let’s understand the code of the Logistic Regression

Using Logistic Regression, which by default uses Gradient Descent. Here “lambda” is a hyperparameter. Here “lambda” is hyperparameter C=1/lambda and as C increases it will overfit and as C decreases it will underfit.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
#Importing the Logistic Regression and iris dataset
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(C=0.01).fit(X, y)
#Setting the hyperparameter for the Logistic Regression and #training the model
clf.predict(X[:2, :])
#Predicting the class for the data point
clf.predict_proba(X[:2, :])
#We can also predict the probability of the class

Using Stochastic Gradient Descent

By simply changing the loss parameter we can easily switch between different Classifiers. For logistic regression we will use loss=” log”, for SVM we will use “Hinge” and so on.

from sklearn.linear_model import SGDClassifier
#Importing the SGDClassifier
X = [[0., 0.], [1., 1.]]
y = [0, 1]
clf = SGDClassifier(loss="log")
#Setting the parameter loss=”log” for Logistic Regression
clf.fit(X, y)
#Training the model
clf.predict([[2., 2.]])
#Predicting the class for the data point

Conclusion

Logistic Regression performs well when the data is linearly separable but in the real world, the data is rarely linearly separable. It is less prone to overfitting but we should consider using regularization techniques to avoid overfitting.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Rajdeep

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Geometrical Approach To Understand Logistic Regression

Introduction

Understanding the Basics

Formulating the equation

The solution to the Outlier Problem

Monotonic Function

Need of Regularization

Getting Started with the Code

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect