Feature Selection Techniques in Machine Learning

Dhanya Thailappan Last Updated : 25 Sep, 2024

11 min read

Feature selection plays a crucial role in building accurate and efficient machine learning models. In this article, we explore various feature selection techniques, from filter to wrapper methods, to help reduce data dimensionality and improve model performance. Learn how to choose the most appropriate approach for your dataset. In this article you will get understanding about the feature selection in machine learning, feature selection algorithms and feature selection methods. We are covering on these topics and full insights.

In this article, you will learn what feature selection is, explore machine learning feature selection techniques, and discover how to effectively perform feature selection in your machine learning projects.

This article was published as a part of the Data Science Blogathon.

What is Feature Selection Techniques in Machine Learning?
Need of Feature Selection Techniques in Machine Learning
Filter Method
Wrapper Method
Embedded Methods
Univariate Selection
Feature Importance
Correlation Matrix with Heatmap
Why Feature Selection is important?
Tips and Tricks for Feature Selection
Master the ML Feature Selection Techniques

What is Feature Selection Techniques in Machine Learning?

Feature selection is an important process in machine learning and data analysis. It involves selecting a subset of relevant features from a larger set of available features. These features are also known as variables, predictors, or attributes. The primary objective of feature selection is to identify and retain the most informative and relevant features while discarding or ignoring the irrelevant or redundant ones. By doing so, we can improve the performance of our models by focusing on the most meaningful information and avoiding noise or unnecessary complexity.

Feature selection techniques in machine learning involve selecting the most relevant features or variables from a dataset, which helps to reduce the dimensionality of the data and improve model performance. There are various methods, including filter and wrapper methods, for selecting the best set of features for a given dataset. The goal is to eliminate irrelevant or redundant features while retaining those that have the most predictive power.

Need of Feature Selection Techniques in Machine Learning

Feature selection reduces the dimensionality of the data, making it easier for the model to learn and reducing the risk of overfitting.
It removes irrelevant or redundant features that can negatively impact model performance and accuracy.
It helps to identify the most important features that have the most predictive power, allowing models to be more efficient and effective.
By reducing the number of features, feature selection can also help to reduce training time and computational costs.
Feature selection is essential in building accurate and efficient machine learning models that can generalize well to new data.
It can also improve the interpretability of models by highlighting the most important factors that contribute to predictions.
Different feature selection techniques, including filter, wrapper, and embedded methods, can be used depending on the type of data and the modeling approach.
It is an ongoing process, and it may be necessary to revisit feature selection as new data becomes available or as the model is refined.

12 Important Model Evaluation Metrics for Machine Learning Everyone Should Know (Updated 2024)

Types of Feature Selection Techniques

The choice of feature selection technique depends on the type and amount of data available, as well as the modeling approach. It’s important to experiment with different methods to find the best approach for a given problem.

Filter methods: These methods rank the features based on statistical measures such as correlation, mutual information, or chi-squared tests. Features with the highest scores are selected for the model.
Wrapper methods: These methods involve training and evaluating the model with different subsets of features, using a search algorithm to find the optimal set of features that maximizes model performance.
Embedded methods: These methods incorporate feature selection into the model training process, selecting the most relevant features during the training of the model.
Principal Component Analysis (PCA): This method transforms the data into a lower-dimensional space by identifying linear combinations of features that capture the most significant variability in the data.
Recursive Feature Elimination (RFE): This method iteratively removes the least important features from the model until the desired number of features is reached.
Lasso Regression: This method performs regularization by adding a penalty term to the model’s loss function, which encourages the model to select a sparse set of features.
Genetic Algorithms: These methods use an evolutionary search algorithm to find the optimal set of features that maximizes model performance.
Univariate Feature Selection: This method selects the features that have the strongest relationship with the target variable, based on statistical tests such as ANOVA or t-tests.

Let’s understand each of these methods in depth!

Filter Method

First, we will see about the filter method.

In the filter method, we have three sub-components. The first component is that suppose I have all the set of features I will be selecting the best subset.

How I will be selecting the best subset?

We can apply various techniques. Some of the techniques I would like to tell you are the ANOVA test which is a statistical method and other one is the CHI SQUARE test and one more method I would specify is correlation coefficient. These are the three techniques we use to select some important features. The important features mean that these features will be much correlated with the target output.

Let’s take an example. Here I am having an independent variable X and a target variable Y.

X	Y
1	10
2	20
3	30
4	40

In this scenario, you can see that as X increases, Y also increases. So, concerning the correlation coefficient, you can say that X and Y are highly correlated. We have two terms. One is covariance and the other one is a correlation. Covariance maps the value between 0 and 1. Correlation is between -1 to +1. This correlation is for the Pearson correlation coefficient.

The second technique is the wrapper method.

Wrapper Method

Source

The wrapper method is quite simple when compared to the filter method. Here, you don’t need to apply any statistical kinds of stuff. You have to apply only a simple mechanism. There are three basic mechanisms in this.

Let me explain it.

Forward Selection

This method is used to select the best important features from the particular dataset concerning the target output. Forward selection works simply. It is an iterative method in which we start having no feature in the model. In each iteration, it will keep adding the feature.

Let me explain this with an example.

I am considering A, B, C, D, and E as my independent features. Let F be the output or target feature.

Initially, the model will train with feature A only and record the accuracy. In the next iteration, it will take A and B and train and record accuracy. If this accuracy is better than the previous accuracy, it will be considering adding B in its features set. Likewise, in each iteration, it will be adding different features until it reaches better accuracy.

This is what forward selection is.

Next, we will see about backward selection.

Backward Elimination

This works slightly differently. Let’s discuss the same example. A, B, C, D, and E are independent features. F is the target variable. Now, I will take all the independent features and train the model. Before training the model, I will just apply a statistical test. This test will say that which feature is having the lowest impact on the target variable. This is how backward elimination is implemented.

Let me explain the recursive feature elimination.

Recursive Feature Elimination

It is a greedy optimization algorithm. The main aim of this method is to select a best-performing feature subset. It will not randomly select any feature. Rather than, it will find out which is the most useful feature. And in the next iteration, it will add the next useful feature concerning the target variable. Finally, it will rank all the features and eliminate the lower ones.

Remember that the above-mentioned techniques are useful when the dataset is small.

But in reality, you will get a large dataset.

Let’s try to understand the third technique called embedded methods.

A Tour of Evaluation Metrics for Machine Learning

Embedded Methods

embeded method | feature selection techniques

Let me start with an example. I am having A, B, C, D, and E as independent variables. F is the target variable. The embedded technique creates a lot of subsets from the particular dataset. Sometimes, it may give A to the model and find the accuracy. It may give AB to the model and find the accuracy. It will try to do all the permutations and combinations. Whichever subset is having the maximum accuracy, that will be selected as a subset of features which will later be given to the dataset for training. That is how an embedded method works.

Let’s go and find out how univariate selection is done.

Univariate Selection

Univariate selection is a statistical test and it can be used to select those features that have the strongest relationship with the target variable.

Here, I am using the SelectKBest library. Suppose if you give K value as 5. It will find out the best 5 attributes concerning the target variable.

I am using a mobile price classification dataset. you can download it here.

import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
data = pd.read_csv("train.csv")
X = data.iloc[:,0:20]  
y = data.iloc[:,-1]

The dataset has many features. We have to select the best one. Because as you know in the curse of dimension, if I increase the number of features after a particular threshold value, the accuracy of the model will decrease.
For that, I am using univariate selection and the SelectKBest.

bestfeatures = SelectKBest(score_func=chi2, k=10)
fit = bestfeatures.fit(X,y)

After fitting, I will get two different parameters. One is fit.scores which will calculate the score with respect to the chi-square test value.

dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)

I am concatenating in the next statement for better visualization and I am renaming the column as Specs and Score.

featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']

Here, you can see all the features. The higher the score, the more important the feature is. here, the ram has the highest score.

featureScores

I am printing the top 10 features.

print(featureScores.nlargest(10,'Score'))

These 10 best features can be used to train the model.

Let’s look into the next technique called feature importance.

Feature Importance

Here, you can get the feature importance of every feature. The higher the score, the more important the feature is. An inbuilt classifier called Extra Tree Classifier is used here to extract the best 10 features.

from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
model = ExtraTreesClassifier()
model.fit(X,y)

extratree model | feature selection techniques

After fitting, you can see the scores of the features.

print(model.feature_importances_)

feature importance | feature selection techniques

The best 10 features can be seen like this.

feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.show()

Let me explain the last technique.

Correlation Matrix with Heatmap

Here, we are checking each and every feature. The correlation can be plotted like this.

import seaborn as sns
corrmat = data.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")

feature selection techniques | corelation matrix

Here, the correlation value ranges from 0 to 1. The correlation between price_range and ram is very high and between battery and price_range is low.

Why Feature Selection is important?

Here are the Points wht Feature Selection is Important :

Better Predictions

More Accurate Results: By selecting only the important features, the model can make better predictions. It focuses on the key information and ignores unnecessary details.
Reduces Overfitting: If a model learns too much from irrelevant data, it can become too complex and perform poorly on new data. Feature selection helps keep the model simpler, which leads to better performance.

Saves Time and Resources

Faster Training: With fewer features to analyze, the model can be trained more quickly. This is especially helpful when working with large datasets.
Less Computing Power Needed: A simpler model requires less memory and processing power, making it easier to run on regular computers.

Easier to Understand

Simpler Models: Models that use fewer features are easier to explain. This is important in areas like healthcare and finance, where understanding decisions is crucial.
Identifies Key Factors: Feature selection helps find which features are most important for making predictions. This gives valuable insights into what influences th results.

Better Handling of Large Datasets

Easier to Find Patterns: In datasets with many features, it can be hard to spot patterns. Feature selection reduces the number of features, making it easier to see what’s important.
Focus on Relevant Information: By removing unnecessary features, the model can concentrate on the most useful information, improving its learning ability.

Tips and Tricks for Feature Selection

Understand your data: Before starting feature selection, it is essential to understand your data and its properties, such as the type of features, their correlation, and the target variable.
Use domain knowledge: Incorporating domain knowledge into feature selection can lead to more relevant and meaningful features.
Consider multiple methods: Several feature selection methods are available, and it’s essential to try multiple methods to determine which one works best for your data.
Evaluate performance: It’s important to evaluate your model’s performance with different feature sets and select the one that yields the best results.
Use ensemble methods: Ensemble methods combines the results of multiple feature selection techniques and provide a more robust feature set.
Regularization: Regularization methods can penalize including irrelevant or redundant features in the model.
Visualize feature importance: Plotting feature importance scores can provide a better understanding of the relevance of each feature.
Avoid overfitting: Overfitting can occur when more features are included in the model, resulting in better generalization performance. It’s important to balance the number of features and the model’s complexity.
Consider feature engineering: Feature engineering can be used to create new features that are more informative and relevant to the target variable.
Automate feature selection: Automated feature selection techniques can save time and reduce the risk of human error.

Master the ML Feature Selection Techniques

These are basic techniques of feature selection. Now, you know that you just have to choose which features are important with respect to the target output. They reduces the dimensionality of the data, improves model performance, and identifies the most important features that have the most predictive power. By using a variety of feature selection techniques such as filter, wrapper, and embedded methods, data scientists can select the best set of features for a given dataset and modeling approach.

To enhance your skills in feature selection and other key data science techniques, consider enrolling in the our Data Science Black Belt program. This program offers a comprehensive curriculum that covers all aspects of data science, from programming languages and data visualization to machine learning and deep learning. With hands-on projects and mentorship, you’ll gain practical experience and the skills you need to succeed in this exciting field. Enroll today and take your data science skills to the next level.

Conclusion

Mastering feature selection techniques like filter methods, wrapper methods (including forward selection, backward elimination, and recursive feature elimination), embedded methods, and tools like univariate selection and correlation matrix heatmaps, is crucial in machine learning. These approaches enhance model accuracy, reduce overfitting, and improve interpretability, ensuring efficient, robust models. Hope you like the and get understanding about the feature selection algorithms and how these methods are explained.

Keytakeways

Hope you like the article on feature selection in machine learning. Feature selection Python libraries provide powerful tools for implementing various feature selection methods in machine learning, such as recursive feature elimination and LASSO. These feature selection methods are essential for enhancing model accuracy and efficiency in feature selection machine learning tasks.

Q1. What are feature selection techniques?

A. Feature selection techniques in machine learning involve selecting the most important features or variables from a dataset, to reduce the dimensionality of the data and improve model performance.

Q2. What are the 3 feature selection techniques?

A. The three main feature selection techniques are filter methods, wrapper methods, and embedded methods.

Q3. What are the 2 different techniques for feature selection?

A. The two main techniques for feature selection are feature ranking and feature subset selection.

Q4. Which technique is a popular technique used for feature attribute selection in machine learning?

A. Filter methods are a popular technique for feature attribute selection in machine learning. These methods rank the features based on statistical measures such as correlation or mutual information, and select the top-ranked features for the model.

Q5. What is an example of feature selection?

A. An example of feature selection is when a researcher tries to determine which variables to include in a regression model. They may use a feature selection method to identify the subset of variables that best predicts the outcome of interest.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Dhanya Thailappan

Predicting the future is not magic. It's an Artificial Intelligence!! This inspired me so much and that's why I love Data Science and Artificial Intelligence. I am currently working as a Data Engineer. I wish to explore more and share my knowledge with others.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Feature Selection Techniques in Machine Learning

Table of contents

What is Feature Selection Techniques in Machine Learning?

Need of Feature Selection Techniques in Machine Learning

Types of Feature Selection Techniques

Filter Method

Wrapper Method

Forward Selection

Backward Elimination

Recursive Feature Elimination

Embedded Methods

Univariate Selection

Feature Importance

Correlation Matrix with Heatmap

Why Feature Selection is important?

Better Predictions

Saves Time and Resources

Easier to Understand

Better Handling of Large Datasets

Tips and Tricks for Feature Selection

Master the ML Feature Selection Techniques

Conclusion

Keytakeways

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)