Feature selection plays a crucial role in building accurate and efficient machine learning models. In this article, we explore various feature selection techniques, from filter to wrapper methods, to help reduce data dimensionality and improve model performance. Learn how to choose the most appropriate approach for your dataset. In this article you will get understanding about the feature selection in machine learning, feature selection algorithms and feature selection methods. We are covering on these topics and full insights.
In this article, you will learn what feature selection is, explore machine learning feature selection techniques, and discover how to effectively perform feature selection in your machine learning projects.
This article was published as a part of the Data Science Blogathon.
Feature selection is an important process in machine learning and data analysis. It involves selecting a subset of relevant features from a larger set of available features. These features are also known as variables, predictors, or attributes. The primary objective of feature selection is to identify and retain the most informative and relevant features while discarding or ignoring the irrelevant or redundant ones. By doing so, we can improve the performance of our models by focusing on the most meaningful information and avoiding noise or unnecessary complexity.
Feature selection techniques in machine learning involve selecting the most relevant features or variables from a dataset, which helps to reduce the dimensionality of the data and improve model performance. There are various methods, including filter and wrapper methods, for selecting the best set of features for a given dataset. The goal is to eliminate irrelevant or redundant features while retaining those that have the most predictive power.
The choice of feature selection technique depends on the type and amount of data available, as well as the modeling approach. It’s important to experiment with different methods to find the best approach for a given problem.
Let’s understand each of these methods in depth!
First, we will see about the filter method.
In the filter method, we have three sub-components. The first component is that suppose I have all the set of features I will be selecting the best subset.
How I will be selecting the best subset?
We can apply various techniques. Some of the techniques I would like to tell you are the ANOVA test which is a statistical method and other one is the CHI SQUARE test and one more method I would specify is correlation coefficient. These are the three techniques we use to select some important features. The important features mean that these features will be much correlated with the target output.
Let’s take an example. Here I am having an independent variable X and a target variable Y.
X | Y |
1 | 10 |
2 | 20 |
3 | 30 |
4 | 40 |
In this scenario, you can see that as X increases, Y also increases. So, concerning the correlation coefficient, you can say that X and Y are highly correlated. We have two terms. One is covariance and the other one is a correlation. Covariance maps the value between 0 and 1. Correlation is between -1 to +1. This correlation is for the Pearson correlation coefficient.
The second technique is the wrapper method.
The wrapper method is quite simple when compared to the filter method. Here, you don’t need to apply any statistical kinds of stuff. You have to apply only a simple mechanism. There are three basic mechanisms in this.
Let me explain it.
This method is used to select the best important features from the particular dataset concerning the target output. Forward selection works simply. It is an iterative method in which we start having no feature in the model. In each iteration, it will keep adding the feature.
Let me explain this with an example.
I am considering A, B, C, D, and E as my independent features. Let F be the output or target feature.
Initially, the model will train with feature A only and record the accuracy. In the next iteration, it will take A and B and train and record accuracy. If this accuracy is better than the previous accuracy, it will be considering adding B in its features set. Likewise, in each iteration, it will be adding different features until it reaches better accuracy.
This is what forward selection is.
Next, we will see about backward selection.
This works slightly differently. Let’s discuss the same example. A, B, C, D, and E are independent features. F is the target variable. Now, I will take all the independent features and train the model. Before training the model, I will just apply a statistical test. This test will say that which feature is having the lowest impact on the target variable. This is how backward elimination is implemented.
Let me explain the recursive feature elimination.
It is a greedy optimization algorithm. The main aim of this method is to select a best-performing feature subset. It will not randomly select any feature. Rather than, it will find out which is the most useful feature. And in the next iteration, it will add the next useful feature concerning the target variable. Finally, it will rank all the features and eliminate the lower ones.
Remember that the above-mentioned techniques are useful when the dataset is small.
But in reality, you will get a large dataset.
Let’s try to understand the third technique called embedded methods.
Let me start with an example. I am having A, B, C, D, and E as independent variables. F is the target variable. The embedded technique creates a lot of subsets from the particular dataset. Sometimes, it may give A to the model and find the accuracy. It may give AB to the model and find the accuracy. It will try to do all the permutations and combinations. Whichever subset is having the maximum accuracy, that will be selected as a subset of features which will later be given to the dataset for training. That is how an embedded method works.
Let’s go and find out how univariate selection is done.
Univariate selection is a statistical test and it can be used to select those features that have the strongest relationship with the target variable.
Here, I am using the SelectKBest library. Suppose if you give K value as 5. It will find out the best 5 attributes concerning the target variable.
I am using a mobile price classification dataset. you can download it here.
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
data = pd.read_csv("train.csv")
X = data.iloc[:,0:20]
y = data.iloc[:,-1]
The dataset has many features. We have to select the best one. Because as you know in the curse of dimension, if I increase the number of features after a particular threshold value, the accuracy of the model will decrease.
For that, I am using univariate selection and the SelectKBest.
bestfeatures = SelectKBest(score_func=chi2, k=10)
fit = bestfeatures.fit(X,y)
After fitting, I will get two different parameters. One is fit.scores which will calculate the score with respect to the chi-square test value.
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
I am concatenating in the next statement for better visualization and I am renaming the column as Specs and Score.
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']
Here, you can see all the features. The higher the score, the more important the feature is. here, the ram has the highest score.
featureScores
I am printing the top 10 features.
print(featureScores.nlargest(10,'Score'))
These 10 best features can be used to train the model.
Let’s look into the next technique called feature importance.
Here, you can get the feature importance of every feature. The higher the score, the more important the feature is. An inbuilt classifier called Extra Tree Classifier is used here to extract the best 10 features.
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
model = ExtraTreesClassifier()
model.fit(X,y)
After fitting, you can see the scores of the features.
print(model.feature_importances_)
The best 10 features can be seen like this.
feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.show()
Let me explain the last technique.
Here, we are checking each and every feature. The correlation can be plotted like this.
import seaborn as sns
corrmat = data.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")
Here, the correlation value ranges from 0 to 1. The correlation between price_range and ram is very high and between battery and price_range is low.
Here are the Points wht Feature Selection is Important :
These are basic techniques of feature selection. Now, you know that you just have to choose which features are important with respect to the target output. They reduces the dimensionality of the data, improves model performance, and identifies the most important features that have the most predictive power. By using a variety of feature selection techniques such as filter, wrapper, and embedded methods, data scientists can select the best set of features for a given dataset and modeling approach.
To enhance your skills in feature selection and other key data science techniques, consider enrolling in the our Data Science Black Belt program. This program offers a comprehensive curriculum that covers all aspects of data science, from programming languages and data visualization to machine learning and deep learning. With hands-on projects and mentorship, you’ll gain practical experience and the skills you need to succeed in this exciting field. Enroll today and take your data science skills to the next level.
Mastering feature selection techniques like filter methods, wrapper methods (including forward selection, backward elimination, and recursive feature elimination), embedded methods, and tools like univariate selection and correlation matrix heatmaps, is crucial in machine learning. These approaches enhance model accuracy, reduce overfitting, and improve interpretability, ensuring efficient, robust models. Hope you like the and get understanding about the feature selection algorithms and how these methods are explained.
Hope you like the article on feature selection in machine learning. Feature selection Python libraries provide powerful tools for implementing various feature selection methods in machine learning, such as recursive feature elimination and LASSO. These feature selection methods are essential for enhancing model accuracy and efficiency in feature selection machine learning tasks.
A. Feature selection techniques in machine learning involve selecting the most important features or variables from a dataset, to reduce the dimensionality of the data and improve model performance.
A. The three main feature selection techniques are filter methods, wrapper methods, and embedded methods.
A. The two main techniques for feature selection are feature ranking and feature subset selection.
A. Filter methods are a popular technique for feature attribute selection in machine learning. These methods rank the features based on statistical measures such as correlation or mutual information, and select the top-ranked features for the model.
A. An example of feature selection is when a researcher tries to determine which variables to include in a regression model. They may use a feature selection method to identify the subset of variables that best predicts the outcome of interest.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.