4 Simple Ways to Split a Decision Tree in Machine Learning (Updated 2025)

Abhishek Sharma Last Updated : 12 Dec, 2024

8 min read

A decision tree is a powerful machine learning algorithm extensively used in the field of data science. They are simple to implement and equally easy to interpret. It also serves as the building block for other widely used and complicated machine-learning algorithms like Random Forest, XGBoost, and LightGBM. I often lean on the decision tree algorithm as my go-to machine learning algorithm, whether I’m starting a new project or competing in a hackathon. In this article, I will explain 4 simple methods for split a decision tree.

Learning Objectives

Learn how to split a decision tree based on different splitting criteria.
Get acquainted with the Reduction in Variance, Gini Impurity, Information Gain, and Chi-square in decision trees.
Know the difference between these different methods of splitting.

I assume familiarity with the basic concepts in regression and decision trees. Here are two free and popular courses to quickly learn or brush up on the key concepts:

Basic Decision Tree Terminologies
What Is Node Splitting in a Decision Tree and Why Is It Done?
Reduction in Variance in Decision Tree
Information Gain in Decision Tree
Gini Impurity in Decision Tree
Chi-Square in Decision Tree
Conclusion
Frequently Asked Questions

Basic Decision Tree Terminologies

Let’s quickly go through some of the key terminologies related to Split a decision tree which we’ll be using throughout this article.

Parent and Child Node: When a node divides into sub-nodes, it becomes the Parent Node, and the sub-nodes become Child Nodes. A node can divide into multiple sub-nodes, acting as the parent of several child nodes.
Root Node: The topmost node of a decision tree. It does not have any parent node. It represents the entire population or sample.
Leaf / Terminal Nodes: Nodes of the tree that do not have any child node are known as Terminal/Leaf Nodes.

There are multiple tree models to choose from based on their learning technique when building a decision tree, e.g., ID3, CART, Classification and Regression Tree, C4.5, etc. Selecting which decision tree to use is based on the problem statement. For example, for classification problems, we mainly use a classification tree with a gini index to identify class labels for datasets with relatively more number of classes.

In this article, we will mainly discuss the CART tree.

What Is Node Splitting in a Decision Tree and Why Is It Done?

Modern-day programming libraries have made using any machine learning algorithm easy, but this comes at the cost of hidden implementation, which is a must-know for fully understanding an algorithm. Another reason for this infinite struggle is the availability of multiple ways to split decision tree nodes adding to further confusion.

Before learning any topic, I believe it is essential to understand why you’re learning it. That helps in understanding the goal of learning a concept. So let’s understand why to learn about node splitting in decision trees.

Since you all know that how extensively decision trees are used, there is no denying the fact that learning about decision trees is a must. A decision tree makes decisions by splitting nodes into sub-nodes. It is a supervised learning algorithm. This process is performed multiple times in a recursive manner during the training process until only homogenous nodes are left. This is why a decision tree performs so well. The process of recursive node splitting into subsets created by each sub-tree can cause overfitting. Therefore, node splitting is a key concept that everyone should know.

Node splitting, or simply splitting, divides a node into multiple sub-nodes to create relatively pure nodes. This is done by finding the best split for a node and can be done in multiple ways. The ways of splitting a node can be broadly divided into two categories based on the type of target variable:

Continuous Target Variable: Reduction in Variance
Categorical Target Variable: Gini Impurity, Information Gain, and Chi-Square

We’ll look at each splitting method in detail in the upcoming sections. Let’s start with the first method of splitting – reduction in variance.

Also, Read about this article about Gsplitting decision tree with Gini Impurity

Reduction in Variance in Decision Tree

Reduction in Variance is a method for splitting the node used when the target variable is continuous, i.e., regression problems. It is called so because it uses variance as a measure for deciding the feature on which a node is split into child nodes.

Variance is used for calculating the homogeneity of a node. If a node is entirely homogeneous, then the variance is zero.

Here are the steps to split a decision tree using the reduction in variance method:

For each split, individually calculate the variance of each child node
Calculate the variance of each split as the weighted average variance of child nodes
Select the split with the lowest variance
Perform steps 1-3 until completely homogeneous nodes are achieved

The below video excellently explains the reduction in variance using an example:

Information Gain in Decision Tree

Now, what if we have a categorical target variable? For categorical variables, a reduction in variation won’t quite cut it. Well, the answer to that is Information Gain. The Information Gain method splits the nodes when the target variable is categorical. It operates based on the concept of entropy and is given by:

Entropy is used for calculating the purity of a node. The lower the value of entropy, the higher the purity of the node. The entropy of a homogeneous node is zero. Since we subtract entropy from 1, the Information Gain is higher for the purer nodes with a maximum value of 1. Now, let’s take a look at the formula for calculating the entropy:

Steps to split a decision tree using Information Gain:

For each split, individually calculate the entropy of each child node
Calculate the entropy of each split as the weighted average entropy of child nodes
Select the split with the lowest entropy or highest information gain
Until you achieve homogeneous nodes, repeat steps 1-3

Here’s a video on how to use information gain for splitting a decision tree:

Gini Impurity in Decision Tree

Gini Impurity is a method for splitting the nodes when the target variable is categorical. It is the most popular and easiest way to split a decision tree. The Gini Impurity value is:

Wait – what is Gini?

Gini is the probability of correctly labeling a randomly chosen element if it is randomly labeled according to the distribution of labels in the node. The formula for Gini is:

And Gini Impurity is:

The lower the Gini Impurity, the higher the homogeneity of the node. The Gini Impurity of a pure node is zero. Now, you might be thinking we already know about Information Gain then, why do we need Gini Impurity?

Gini Impurity is preferred to Information Gain because it does not contain logarithms which are computationally intensive.

Here are the steps to split a decision tree using Gini Impurity:

Similar to what we did in information gain. For each split, individually calculate the Gini Impurity of each child node
Calculate the Gini Impurity of each split as the weighted average Gini Impurity of child nodes
Select the split with the lowest value of Gini Impurity
Until you achieve homogeneous nodes, repeat steps 1-3

And here’s Gini Impurity in video form:

Chi-Square in Decision Tree

Chi-square is another method of splitting nodes in a decision tree for datasets having categorical target values. It is used to make two or more splits in a node. It works on the statistical significance of differences between the parent node and child nodes.

The Chi-Square value is:

Here, the Expected is the expected value for a class in a child node based on the distribution of classes in the parent node, and the Actual is the actual value for a class in a child node.

The above formula gives us the value of Chi-Square for a class. Take the sum of Chi-Square values for all the classes in a node to calculate the Chi-Square for that node. The higher the value, the higher will be the differences between parent and child nodes, i.e., the higher will be the homogeneity.

Here are the steps to split a decision tree using Chi-Square:

For each split, individually calculate the Chi-Square value of each child node by taking the sum of Chi-Square values for each class in a node
Calculate the Chi-Square value of each split as the sum of Chi-Square values for all the child nodes
Select the split with a higher Chi-Square value
Until you achieve homogeneous nodes, repeat steps 1-3

Here’s a video explaining Chi-Square in the context of a decision tree:

Conclusion

Decision trees are an important tool in machine learning for solving classification and regression problems. However, creating an effective decision tree requires choosing the right features and splitting the data in a way that maximizes information gain. After reading the above article, you know about different methods of splitting a decision tree.

Key Takeaways

In this article, we learned about how splitting in a decision tree works.
We got to learn about splitting in regression and classification problems.
There is no best method for splitting a decision tree, but there are some methods that are used more than others. It depends on your training data and the problem statement.

In the next steps, you can watch our complete playlist on decision trees on youtube. Or, you can take our free course on decision trees here.

I have also put together a list of fantastic articles on decision trees below:

Frequently Asked Questions

Q1. What is the best method for splitting a decision tree?

A. The most widely used method for splitting a decision tree is the gini index or the entropy. The default method used in sklearn is the gini index for the decision tree classifier. The scikit learn library provides all the splitting methods for classification and regression trees. You can choose from all the options based on your problem statement and dataset.

Q2. What are the advantages of a decision tree?

A. The main advantage of a decision tree is that it does not require normalization or scaling of data; hence lesser preprocessing is required for data preparation. Moreover, it is easier to understand and interpret as compared to other classification models.

Q3. How many splits are there in a decision tree?

A. For n number of classes in a decision tree, there are 2^(n -1) – 1 maximum splits.

Q4. What is partitioning in a decision tree?

Partitioning divides the dataset into subsets based on feature values, creating branches that group similar data points.

Abhishek Sharma

He is a data science aficionado, who loves diving into data and generating insights from it. He is always ready for making machines to learn through code and writing technical blogs. His areas of interest include Machine Learning and Natural Language Processing still open for something new and exciting.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

MANEESH KUMAR SINGH

Hi, Abhishek Sharma I have one doubt with Decision Tree Splitting Method #3: Gini Impurity. You mension there Gini Impurity is a method for splitting the nodes when the target variable is continuous. is it correct? I have read below blog so i am confuse with it. https://www.analyticsvidhya.com/blog/2016/04/tree-based-algorithms-complete-tutorial-scratch-in-python/

Show 1 reply

Hi Maneesh, Thank you for pointing it out. I have made the necessary improvements.

Vasileios Anagnostopoulos

Hi Abhisek, you do not mention what is a "split". For finite discrete variables, the split is the various values the variable takes or a threshold. For a continuous or infinite discrete we need a threshold. When we go with the threshold, we need to find the optimal threshold. How is it done? Line scan? Any other clever way?

Vani Kapoor

Hi Abhishek, What criteria does C4.5 use for decision tree building. Is it information gain, like ID3.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

4 Simple Ways to Split a Decision Tree in Machine Learning (Updated 2025)

Table of contents

Basic Decision Tree Terminologies

What Is Node Splitting in a Decision Tree and Why Is It Done?

Reduction in Variance in Decision Tree

Information Gain in Decision Tree

Gini Impurity in Decision Tree

Chi-Square in Decision Tree

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#