How the Gradient Boosting Algorithm Works?

Nitesh Last Updated : 30 Jan, 2025

6 min read

Gradient Boosting is a powerful machine learning algorithm that reduces bias error in models. Unlike AdaBoost, where we specify the base estimator, Gradient Boosting uses a fixed Decision Stump. We can tune the n_estimators parameter, which defaults to 100 if not specified. This algorithm works for both regression (using Mean Squared Error) and classification (using Log Loss). This article will cover key topics related to Gradient Boosting.

This article was published as a part of the Data Science Blogathon.

Manual Example for understanding the algorithm:
- 1 st -Estimator
- 2 nd -Estimator
Finding the best estimator with GridSearchCV
Applications
Frequently Asked Questions

Manual Example for understanding the algorithm:

Let us now understand the working of the Gradient Boosting Algorithm with the help of one example. In the following example, Age is the Target variable whereas LikesExercising, GotoGym, DrivesCar are independent variables. As in this example, the target variable is continuous, GradientBoostingRegressor is used here.

1^st-Estimator

For estimator-1, the root level (level 0) will consist of all the records. The predicted age at this level is equal to the mean of the entire Age column i.e. 41.33(addition of all values in Age column divided by a number of records i.e. 9). Let us find out what is the MSE for this level. MSE is calculated as the mean of the square of errors. Here error is equal to actual age-predicted age. The predicted age for the particular node is always equal to the mean of age records of that node. So, the MSE of the root node of the 1^st estimator is calculated as given below.

MSE=(∑(Age_i–mu)² )/9=577.11

The cost function hers is MSE and the objective of the algorithm here is to minimize the MSE.

Now, one of the independent variables will be used by the Gradient boosting to create the Decision Stump. Let us suppose that the LikesExercising
is used here for prediction. So, the records with False LikesExercising will go in one child node, and records with True LikesExercising will go in another child node as shown below.

Let us find the means and MSEs of both the nodes of level 1. For the left node, the mean is equal to 20.25 and MSE is equal to 83.19. Whereas, for the right node, the mean is equal to 58.2 and MSE is equal to 332.16. Total MSE for level 1 will be equal to the addition of all nodes of level 1 i.e. 83.19+332.16=415.35. We can see here, the cost function i.e. MSE of level 1 is better than level 0.

2^nd-Estimator

Let us now find out the estimator-2. Unlike AdaBoost, in the Gradient boosting algorithm, residues (age_i – mu)of the first estimator are taken as root nodes as shown below. Let us suppose for this estimator another dependent variable is used for prediction. So, the records with False GotoGym
will go in one child node, and records with True GotoGym will go in another child node as shown below.

The prediction of age here is slightly tricky. First, the age will be predicted from estimator 1 as per the value of LikeExercising, and then the mean from the estimator is found out with the help of the value of GotoGym and then that means is added to age-predicted from the first estimator and that is the final prediction of Gradient boosting with two estimators.

Let us consider if we want to predict the age for the following records:

Here, LikesExercising is equal to False. So, the predicted age from the first estimator will be 20.25 (i.e. mean of the left node of the first estimator). Now we need to check what is the value of GotoGym for the second predictor and its value is True. So, the mean of True GotoGym in the 2^nd estimator is -3.56. This will be added to the prediction of the first estimator i.e. 20.25. So final prediction of this model will be 20.25+(-3.56) = 16.69.

Let us predict of ages of all records we have in the example:

Let us now find out the Final MSE for above all 9 records.

MSE = ((-2.69)² +(-1.69)²+ (-0.69)² +(-28.64)² +(19.31)² +(-15.33)² + (14.36)² +(6.67)² +(7.67)² )/9

= 194.2478

So, we can see that the final MSE is much better than the MSE of the root node of the 1^st Estimator. This is only for 2 estimators. There can be n number of estimators in gradient boosting algorithm.

Checkout this article about Hyperparameter Tuning and its Techniques

Python Code for the Same

Let us now write a python code for the same.

# Importing required modules
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.preprocessing import LabelEncoder 
# Let us create the Data-Frame for above 
X=pd.DataFrame({'LikesExercising':[False,False,False,True,False,True,True,True,True],
                'GotoGym':[True,True,True,True,True,False,True,False,False],
                 'DrivesCar':[True,False,False,True,True,False,True,False,True]})
Y=pd.Series(name='Age',data=[14,15,16,26,36,50,69,72,74])
# Let us encode true and false to number value 0 and 1
LE=LabelEncoder()
X['LikesExercising']=LE.fit_transform(X['LikesExercising'])
X['GotoGym']=LE.fit_transform(X['GotoGym'])
X['DrivesCar']=LE.fit_transform(X['DrivesCar'])

#We will now see the effect of different numbers of estimators on MSE.
# 1) Let us now use  GradientBoostingRegressor with 2 estimators to train the model and to predict the age for the same inputs. 
GB=GradientBoostingRegressor(n_estimators=2)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 2 estimators 
Y_predict
# Output
#Y_predict=[38.23 , 36.425, 36.425, 42.505, 38.23 , 45.07 , 42.505, 45.07 ,47.54]
#Following code is used to find out MSE of prediction with Gradient boosting algorithm having estimator 2.
MSE_2=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for two estimators :',MSE_2)
#Output: MSE for two estimators : 432.482055555546
# 2) Let us now use GradientBoostingRegressor with 3 estimators to train the model and to predict the age for the same inputs. 
GB=GradientBoostingRegressor(n_estimators=3)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 3 estimators
Y_predict
# Output
#Y_predict=[36.907, 34.3325, 34.3325, 43.0045, 36.907 , 46.663 , 43.0045, 46.663 , 50.186]
#Following code is used to find out MSE of prediction with Gradient boosting algorithm having estimator 3.
MSE_3=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for three estimators :',MSE_3)
#Output: MSE for three estimators : 380.05602055555556
# 3) Let us now use GradientBoostingRegressor with 50 estimators to train the model and to predict the age for the same inputs.
GB=GradientBoostingRegressor(n_estimators=50)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 50 estimators
Y_predict
# Output
#Y_predict=[25.08417833, 15.63313919, 15.63313919, 47.46821839, 25.08417833,       60.89864242, 47.46821839, 60.89864242, 73.83164334]
#Following code is used to find out MSE of prediction with Gradient boosting algorithm having estimator 50.
MSE_50=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for fifty estimators :',MSE_50)
#Output: MSE for fifty estimators: 156.5667260994211

Observation

As we can see here, MSE reduces as we increase the estimator value. The situation comes where MSE becomes saturated which means even if we increase the estimator value there will be no significant decrease in MSE.

Finding the best estimator with GridSearchCV

Let us now see, how GridSearchCV can be used to find the best estimator for the above example.

from sklearn.model_selection import GridSearchCV
model=GradientBoostingRegressor()
params={'n_estimators':range(1,200)}
grid=GridSearchCV(estimator=model,cv=2,param_grid=params,scoring='neg_mean_squared_error')
grid.fit(X,Y)
print("The best estimator returned by GridSearch CV is:",grid.best_estimator_)
#Output
#The best estimator returned by GridSearch CV is:  GradientBoostingRegressor(n_estimators=19)
GB=grid.best_estimator_
GB.fit(X,Y)
Y_predict=GB.predict(X)
Y_predict
#output:
#Y_predict=[27.20639114, 18.98970027, 18.98970027, 46.66697477, 27.20639114,58.34332496, 46.66697477, 58.34332496, 69.58721772]
MSE_best=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for best estimators :',MSE_best)
#Following code is used to find out MSE of prediction for Gradient boosting algorithm with best estimator value given by GridSearchCV
#Output: MSE for best estimators : 164.2298548605391

Observation

You may think that MSE for n_estimator=50 is better than MSE for n_estimator=19. Still GridSearchCV returns 19 not 50. Actually, we can observe here is until 19 with each increment in estimator value the reduction in MSE was significant, but after 19 there is no significant decrease in MSE with increment in estimators. So, n_estimator=19 was returned by GridSearchSV.

Applications

Gradient Boosting Algorithm is generally used when we want to decrease the Bias error.
It can be used in regression as well as classification problems. In regression problems, the cost function is MSE whereas, in classification problems, the cost function is Log-Loss.

Conclusion

This article explains how Gradient Boosting works using a simple example. The Gradient Boosting Algorithm is primarily used to reduce Bias error and includes models like GradientBoostingRegressor (using MSE as the cost function) and GradientBoostingClassifier (using Log-Loss for classification). A key aspect of this algorithm is determining the optimal value for n_estimators, which can be achieved using GridSearchCV. After reading, it is recommended to practice applying this algorithm and then learn how to tune its hyper-parameters effectively.

Frequently Asked Questions

Q1. What is gradient boosting and how it works?

A. Gradient boosting, an ensemble machine learning technique, constructs a robust predictive model by sequentially combining multiple weak models. It minimizes errors using a gradient descent-like approach during training.

Q2. How does gradient boost do regression?

A. In regression, gradient boosting fits a series of weak learners, typically decision trees, to residuals, gradually reducing prediction errors through model updates.

Q3. How does light gradient boosting work?

A. Light Gradient Boosting, or LightGBM, employs a tree-based learning algorithm, utilizing a histogram-based method for faster training and reduced memory usage.

Q4. How does gradient boosting reduce bias?

A. Gradient boosting reduces bias by iteratively adding weak models, focusing on remaining errors and emphasizing previously mispredicted instances. This sequential process refines predictions, ultimately minimizing bias in the final model

Nitesh

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

How the Gradient Boosting Algorithm Works?

Table of contents

Manual Example for understanding the algorithm:

1st-Estimator

2nd-Estimator

Python Code for the Same

Observation

Finding the best estimator with GridSearchCV

Observation

Applications

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

1^st-Estimator

2^nd-Estimator