Principal Component Analysis

Himanshi Singh Last Updated : 04 Apr, 2025

14 min read

You’re working on a large-scale data science project, and the given dataset has too many variables, leading to a few possible situations you might encounter. For instance, upon analysis, you might find that most variables are correlated, leaving you indecisive about how to proceed. Losing patience, you might decide to run a model on the entire dataset, only to get poor accuracy, leaving you frustrated and prompting you to think of a strategic method to identify a few important variables. This is where Principal Component Analysis (PCA) comes into play. Trust me, dealing with such situations isn’t as difficult as it sounds, as this is a common scenario in machine learning projects. Statistical techniques like factor analysis and PCA help overcome these challenges. In this post, I’ve explained the concept of PCA in a simple and informative way, and I’ve also demonstrated how to use this technique in R, complete with interpretations for practical understanding.

What is Principal Component Analysis?
PCA vs LDA vs Factor Analysis
What are Principal Components?
- First Principal Component
- Second Principal Component ( Z² )
How Principal Component Analysis (PCA) Work ?
Principal Component Analysis (PCA) Examples
Why is Normalization of Variables Necessary in Principal Component Analysis (PCA)?
Implement PCA in R & Python (With Interpretation)
Predictive Modeling With PCA Components

Key Takeaways

Frequently Asked Questions

What is Principal Component Analysis?

Principal Component Analysis (PCA) is a powerful technique used in data analysis, particularly for reducing the dimensionality of datasets while preserving crucial information. It does this by transforming the original variables into a set of new, uncorrelated variables called principal components. Here’s a breakdown of PCA’s key aspects:

Dimensionality Reduction: PCA helps manage high-dimensional datasets by extracting essential information and discarding less relevant features, simplifying analysis.
Data Exploration and Visualization: It plays a significant role in data exploration and visualization, aiding in uncovering hidden patterns and insights.
Linear Transformation: PCA performs a linear transformation of data, seeking directions of maximum variance.
Feature Selection: Principal components are ranked by the variance they explain, allowing for effective feature selection.
Data Compression: PCA can compress data while preserving most of the original information.
Clustering and Classification: It finds applications in clustering and classification tasks by reducing noise and highlighting underlying structure.
Advantages: PCA offers linearity, computational efficiency, and scalability for large datasets.
Limitations: It assumes data normality and linearity and may lead to information loss.
Matrix Requirements: PCA works with symmetric correlation or covariance matrices and requires numeric, standardized data.
Eigenvalues and Eigenvectors: Eigenvalues represent variance magnitude, and eigenvectors indicate variance direction.
Number of Components: The number of principal components chosen determines the number of eigenvectors computed.

Checkout this article about the Principal Component Analysis in Machine Learning

PCA Example

Let’s say we have a data set of dimension 300 (n) × 50 (p). n represents the number of observations, and p represents the number of predictors. Since we have a large p = 50, there can be p(p-1)/2 scatter plots, i.e., more than 1000 plots possible to analyze the variable relationship. Wouldn’t it be a tedious job to perform exploratory analysis on this data?

In this case, it would be a lucid approach to select a subset of p (p << 50) predictor which captures so much information, followed by plotting the observation in the resultant low-dimensional space.

The image below shows the transformation of high-dimensional data (3 dimension) to low-dimensional data (2 dimension) using PCA. Not to forget, each resultant dimension is a linear combination of p features

PCA vs LDA vs Factor Analysis

The table provides a concise comparison of three dimensionality reduction techniques: PCA, LDA, and Factor Analysis. It outlines their key characteristics, with PCA and Factor Analysis being unsupervised methods, and LDA being supervised. PCA and Factor Analysis aim to reduce dimensions and simplify data, while LDA seeks class separation.

Technique	Description
PCA (Principal Component Analysis)	Unsupervised dimension reduction technique. Reduces dimensions without considering class labels. Transforms correlated variables into linearly uncorrelated principal components that capture most of the data variance. Useful for data visualization and simplifying complex data.
LDA (Linear Discriminant Analysis)	Supervised dimension reduction technique. Takes class labels into account to find a feature combination that maximizes class separation. Useful for classification tasks and finding discriminant features.
Factor Analysis	Used to identify underlying, unmeasured variables (factors) that explain the variability across observed variables. Focuses on understanding latent structures in the data. Useful for revealing relationships and reducing dimensions based on these latent factors.

What are Principal Components?

A principal component (PCA) is a normalized linear combination of the original features in a data set. In the image above, PC1 and PC2 are the principal components. Let’s say we have a set of predictors as X¹, X²...,X^p

The PCA in Python can be written as:

Z¹ = Φ¹¹X¹ + Φ²¹X² + Φ³¹X³ + .... +Φ^p¹X^p

where,

Z¹ is the first principal component
Φ^p¹ is the loading vector comprising loadings (Φ¹, Φ²..) of the first principal component. The loadings are constrained to a sum of squares equals to 1. This is because a large magnitude of loadings may lead to a large variance. It also defines the direction of the principal component (Z¹), along which data varies the most. It results in a line in p dimensional space, which is closest to the n observations. Closeness is measured using average squared euclidean distance.
X¹..X^p are normalized predictors. Normalized predictors have mean values equal to zero and standard deviations equal to one.

First Principal Component

The first principal component is a linear combination of original predictor variables that captures the data set’s maximum variance. It determines the direction of highest variability in the data. Larger the variability captured in the first component, larger the information captured by component. No other component can have variability higher than first principal component.

The first principal component results in a line that is closest to the data, i.e., it minimizes the sum of squared distance between a data point and the line.

Similarly, we can compute the second principal component also.

Check this Full Guide for PCA in this article!

Second Principal Component (`Z²`)

The second principal component is also a linear combination of original predictors, which captures the remaining variance in the data set and is uncorrelated with Z¹. In other words, the correlation between first and second components should be zero. It can be represented as:

Z² = Φ¹²X¹ + Φ²²X² + Φ³²X³ + .... + Φ^p2X^p

If the two components are uncorrelated, their directions should be orthogonal (image below). This image is based on simulated data with 2 predictors. Notice the direction of the components; as expected, they are orthogonal. This suggests the correlation b/w these components is zero.

PCA : Orthogonality of Principal Components

All succeeding principal component follows a similar concept, i.e., they capture the remaining variation without being correlated with the previous component. In general, for n × p dimensional data, min(n-1, p) principal component can be constructed.

The directions of these components are identified unsupervised; i.e., the response variable(Y) is not used to determine the component direction. Therefore, it is an unsupervised approach.

Note: Partial least square (PLS) is a supervised alternative to PCA. PLS assigns a higher weight to variables that are strongly related to response variable to determine principal components.

How Principal Component Analysis (PCA) Work ?

Standardize the Data
If the features of your dataset are on different scales, it’s essential to standardize them (subtract the mean and divide by the standard deviation).
Compute the Covariance Matrix
Calculate the covariance matrix for the standardized dataset.
Compute Eigenvectors and Eigenvalues
Find the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors represent the directions of maximum variance, and the corresponding eigenvalues indicate the magnitude of variance along those directions.
Sort Eigenvectors by Eigenvalues
Sort the eigenvectors based on their corresponding eigenvalues in descending order.
Choose Principal Components
Select the top k eigenvectors (principal components) where k is the desired dimensionality of the reduced dataset.
Transform the Data
Multiply the original standardized data by the selected principal components to obtain the new, lower-dimensional representation of the data.

PCA in Python

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Example Data
np.random.seed(42)
X = np.random.rand(100, 3)  # 100 samples with 3 features

# Step 1: Standardize the Data
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Step 2-5: PCA
pca = PCA()
X_pca = pca.fit_transform(X_std)

# Plot Explained Variance Ratio
explained_var_ratio = pca.explained_variance_ratio_
cumulative_var_ratio = np.cumsum(explained_var_ratio)

plt.plot(range(1, len(cumulative_var_ratio) + 1), cumulative_var_ratio, marker='o')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance Ratio')
plt.title('Explained Variance Ratio vs. Number of Principal Components')
plt.show()

Principal Component Analysis (PCA) Examples

Image Compression: It reduces image dimensionality for efficient storage without losing critical information.
Genomic Data Analysis: PCA identifies patterns in gene expression data, aiding in disease research.
Financial Data Analysis: It analyzes covariance in asset returns for portfolio optimization.
Spectral Analysis: PCA helps in signal processing to identify dominant spectral features.
Customer Segmentation: It clusters customers based on behavior for targeted marketing.

Why is Normalization of Variables Necessary in Principal Component Analysis (PCA)?

The principal components are supplied with a normalized version of the original predictors. This is because the original predictors may have different scales. For example: Imagine a data set with variables measuring units as gallons, kilometers, light years, etc. The scale of variances in these variables will obviously be large.

Performing PCA on un-normalized variables will lead to exponentially large loadings for variables with high variance. In turn, this will lead to the dependence of a principal component on the variable with high variance. This is undesirable.

As shown in the image below, PCA was run on a data set twice (with unscaled and scaled predictors). This data set has ~40 variables. You can see a variable Item_MRP dominates first principal component and a variable Item_Weight dominates the second principal component. This domination prevails due to high value of variance associated with a variable. When the variables are scaled, we get a much better representation of variables in 2D space.

Implement PCA in R & Python (With Interpretation)

How many principal components to choose from the original dataset? I could dive deep into theory, but it would be better to answer these questions practically.

For this demonstration, I’ll be using the data set from Big Mart Prediction Challenge III.

Remember, Principal Component Analysis can be applied only to numerical data. Therefore, if the data have categorical variables, they must be converted to numerical ones. Also, make sure you have done the basic data cleaning prior to implementing this technique.

Data Loading and Cleaning

Let’s quickly finish with initial data loading and cleaning steps:

#directory path
 > path <- ".../Data/Big_Mart_Sales"
#set working directory
 > setwd(path)
#load train and test file
 > train <- read.csv("train_Big.csv")
 > test <- read.csv("test_Big.csv")
#add a column
 > test$Item_Outlet_Sales <- 1
#combine the data set
 > combi <- rbind(train, test)
#impute missing values with median
 > combi$Item_Weight[is.na(combi$Item_Weight)] <- median(combi$Item_Weight, na.rm = TRUE)
#impute 0 with median
 > combi$Item_Visibility <- ifelse(combi$Item_Visibility == 0, median(combi$Item_Visibility),                                   combi$Item_Visibility)
#find mode and impute
 > table(combi$Outlet_Size, combi$Outlet_Type)
 > levels(combi$Outlet_Size)[1] <- "Other"

Removing the Dependent and Other Identifier Variables

Till here, we’ve imputed missing values. Now we are left with removing the dependent (response) variable and other identifier variables( if any). As we said above, we are practicing an unsupervised learning technique; hence response variable must be removed.

#remove the dependent and identifier variables

 > my_data <- subset(combi, select = -c(Item_Outlet_Sales, Item_Identifier,                                       Outlet_Identifier))

Let’s check the available variables ( a.k.a predictors) in the data set.

#check available variables

 > colnames(my_data)

Since PCA works on numeric variables, let’s see if we have any variables other than numeric.

#check variable class

 > str(my_data)

'data.frame': 14204 obs. of 9 variables:

 $ Item_Weight : num 9.3 5.92 17.5 19.2 8.93 ...

 $ Item_Fat_Content : Factor w/ 5 levels "LF","low fat",..: 3 5 3 5 3 5 5 3 5 5 ...

 $ Item_Visibility : num 0.016 0.0193 0.0168 0.054 0.054 ...

 $ Item_Type : Factor w/ 16 levels "Baking Goods",..: 5 15 11 7 10 1 14 14 6 6 ...

 $ Item_MRP : num 249.8 48.3 141.6 182.1 53.9 ...

 $ Outlet_Establishment_Year: int 1999 2009 1999 1998 1987 2009 1987 1985 2002 2007 ...

 $ Outlet_Size : Factor w/ 4 levels "Other","High",..: 3 3 3 1 2 3 2 3 1 1 ...

 $ Outlet_Location_Type : Factor w/ 3 levels "Tier 1","Tier 2",..: 1 3 1 3 3 3 3 3 2 2 ...

 $ Outlet_Type : Factor w/ 4 levels "Grocery Store",..: 2 3 2 1 2 3 2 4 2 2 ...

Hot Encoding

Sadly, 6 out of 9 variables are categorical in nature. We have some additional work to do now. We’ll convert these categorical variables into numeric ones using one hot encoding.

#load library

> library(dummies)

#create a dummy data frame

> new_my_data <- dummy.data.frame(my_data, names = c("Item_Fat_Content","Item_Type",

                                "Outlet_Establishment_Year","Outlet_Size",

                                "Outlet_Location_Type","Outlet_Type"))

To check if we now have a data set of integer values, simply write:

#check the data set
> str(new_my_data)

Divide Data in Test and Train

And we now have all the numerical values. Let’s divide the data into test and train.

#divide the new data

> pca.train <- new_my_data[1:nrow(train),]

> pca.test <- new_my_data[-(1:nrow(train)),]

We can now go ahead with PCA.

The base R function prcomp() is used to perform PCA. By default, it centers the variable to have a mean equal to zero. With parameter scale. = T, we normalize the variables to have a standard deviation equal to 1.

#principal component analysis

 > prin_comp <- prcomp(pca.train, scale. = T)

 > names(prin_comp)

 [1] "sdev"     "rotation" "center"   "scale"    "x"

The prcomp() Function Results in 5 Useful Measures

1. center and scale

These refers to respective mean and standard deviation of the variables that are used for normalization prior to implementing PCA

#outputs the mean of variables

 prin_comp$center

#outputs the standard deviation of variables

 prin_comp$scale

2. Roatation

The rotation measure provides the principal component loading. Each column of rotation matrix contains the principal component loading vector. This is the most important measure we should be interested in.

> prin_comp$rotation

This returns 44 principal component loadings. Is that correct? Absolutely. The maximum number of principal component loadings in a data set is a minimum of (n-1, p). Let’s look at the first 4 principal components and first 5 rows.

> prin_comp$rotation[1:5,1:4]

                                PC1            PC2            PC3             PC4

Item_Weight                0.0054429225   -0.001285666   0.011246194   0.011887106

Item_Fat_ContentLF        -0.0021983314    0.003768557  -0.009790094  -0.016789483

Item_Fat_Contentlow fat   -0.0019042710    0.001866905  -0.003066415  -0.018396143

Item_Fat_ContentLow Fat    0.0027936467   -0.002234328   0.028309811   0.056822747

Item_Fat_Contentreg        0.0002936319    0.001120931   0.009033254  -0.001026615

3. Principal Component Score

To compute the principal component score vector, we don’t need to multiply the loading with data. Rather, the matrix x has the principal component score vectors in an 8523 × 44 dimension.

> dim(prin_comp$x) 
[1] 8523    44

Let’s plot the resultant principal components.

The parameter scale = 0ensures that arrows are scaled to represent the loadings. To infer from the image above, focus on this graph’s extreme ends (top, bottom, left, right).

We infer that the first principal component corresponds to Outlet_TypeSupermarket, Outlet_Establishment_Year 2007. Similarly, it can be said that the second component corresponds to a measure of Outlet_Location_TypeTier1, Outlet_Sizeother. For the exact measure of a variable in a component, you should look at rotation matrix(above) again.

4. prcomp()

This function also provides the facility to compute standard deviation of each principal component. sdev refers to the standard deviation of principal components.

#compute standard deviation of each principal component
> std_dev <- prin_comp$sdev
#compute variance
> pr_var <- std_dev^2
#check variance of first 10 components
> pr_var[1:10]
[1] 4.563615 3.217702 2.744726 2.541091 2.198152 2.015320 1.932076 1.256831
[9] 1.203791 1.168101

We aim to find the components which explain the maximum variance. This is because, we want to retain as much information as possible using these components. So, higher is the explained variance, higher will be the information contained in those components.

5. Sum total of Variance

To compute the proportion of variance explained by each component, we simply divide the variance by sum of total variance. This results in:

#proportion of variance explained

 > prop_varex <- pr_var/sum(pr_var)

 > prop_varex[1:20]

 [1] 0.10371853 0.07312958 0.06238014 0.05775207 0.04995800 0.04580274

 [7] 0.04391081 0.02856433 0.02735888 0.02654774 0.02559876 0.02556797

 [13] 0.02549516 0.02508831 0.02493932 0.02490938 0.02468313 0.02446016

 [19] 0.02390367 0.02371118

This shows that first principal component explains 10.3% variance. Second component explains 7.3% variance. Third component explains 6.2% variance and so on. So, how do we decide how many components should we select for modeling stage ?

Checkout this article about Machine Learning Algorithms

Screen Plot

The answer to this question is provided by a scree plot. A scree plot is used to access components or factors which explains the most of variability in the data. It represents values in descending order.

#scree plot
 
> plot(prop_varex, xlab = "Principal Component",

             ylab = "Proportion of Variance Explained",

             type = "b")

The plot above shows that ~ 30 components explains around 98.4% variance in the data set. In order words, using PCA we have reduced 44 predictors to 30 without compromising on explained variance. This is the power of PCA> Let’s do a confirmation check, by plotting a cumulative variance plot. This will give us a clear picture of number of components.

#cumulative scree plot
> plot(cumsum(prop_varex), xlab = "Principal Component",
              ylab = "Cumulative Proportion of Variance Explained",
              type = "b")

This plot shows that 30 components result in a variance close to ~ 98%. Therefore, in this case, we’ll select the number of components as 30 [PC1 to PC30] and proceed to the modeling stage. This completes the steps to implement PCA on train data. For modeling, we’ll use these 30 components as predictor variables and follow the normal procedures.

Predictive Modeling With PCA Components

Principal Component Analysis (PCA) is used along with machine learning algorithms for predictive modeling.

After we’ve performed PCA on the training set, let’s now understand the process of predicting test data using these components. The process is simple. Just like we’ve obtained PCA components on the training set, we’ll get another bunch of components on the testing set. Finally, we train the model.

But, few important points to understand:

We should not combine the train and test set to obtain PCA components of the whole data at once, as this would violate the assumption of generalization since the test data would get ‘leaked’ into the training set. In other words, the test data set would no longer remain ‘unseen’. Eventually, this will hammer down the generalization capability of the model.
We should not perform PCA on test and train data sets separately because the resultant vectors from the train and test PCAs will have different directions (due to unequal variance). Due to this, we’ll end up comparing data registered on different axes. Therefore, the resulting train and test data vectors should have same axes.

So, what should we do?

We should do exactly the same transformation to the test set as we did to the training set, including the center and scaling feature. Let’s do it in R:

#add a training set with principal components
> train.data <- data.frame(Item_Outlet_Sales = train$Item_Outlet_Sales, prin_comp$x)
#we are interested in first 30 PCAs
> train.data <- train.data[,1:31]
#run a decision tree
> install.packages("rpart")
> library(rpart)
> rpart.model <- rpart(Item_Outlet_Sales ~ .,data = train.data, method = "anova")
> rpart.model
#transform test into PCA
> test.data <- predict(prin_comp, newdata = pca.test)
> test.data <- as.data.frame(test.data)
#select the first 30 components
> test.data <- test.data[,1:30]
#make prediction on test data
> rpart.prediction <- predict(rpart.model, test.data)
#For fun, finally check your score of leaderboard
> sample <- read.csv("SampleSubmission_TmnO39y.csv")
> final.sub <- data.frame(Item_Identifier = sample$Item_Identifier, Outlet_Identifier = sample$Outlet_Identifier, Item_Outlet_Sales = rpart.prediction)
> write.csv(final.sub, "pca.csv",row.names = F)

That’s the complete modeling process after PCA extraction. I’m sure you wouldn’t be happy with your leaderboard rank after you upload the solution. Try using random forest!

For Python Users:

To implement PCA in python, import PCA from sklearn library. The interpretation remains same as explained for R users above. Of course, the result is some as derived after using R. The data set used for Python is a cleaned version where missing values have been imputed, and categorical variables are converted into numeric. The modeling process remains same, as explained for R users above.

import numpy as np
 from sklearn.decomposition import PCA
 import pandas as pd
 import matplotlib.pyplot as plt
 from sklearn.preprocessing import scale
 %matplotlib inline
#Load data set
 data = pd.read_csv('Big_Mart_PCA.csv')
#convert it to numpy arrays
 X=data.values
#Scaling the values
 X = scale(X)
pca = PCA(n_components=44)
pca.fit(X)
#The amount of variance that each PC explains
 var= pca.explained_variance_ratio_
#Cumulative Variance explains
 var1=np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)
print var1
 [  10.37   17.68   23.92   29.7    34.7    39.28   43.67   46.53   49.27
 51.92   54.48   57.04   59.59   62.1    64.59   67.08   69.55   72.
 74.39   76.76   79.1    81.44   83.77   86.06   88.33   90.59   92.7
 94.76   96.78   98.44  100.01  100.01  100.01  100.01  100.01  100.01
 100.01  100.01  100.01  100.01  100.01  100.01  100.01  100.01]
plt.plot(var1)

#Looking at above plot I'm taking 30 variables

 pca = PCA(n_components=30)

 pca.fit(X)

 X1=pca.fit_transform(X)

print X1

For more information on PCA in python, visit scikit learn documentation.

Conclusion

This brings me to the end of this Principal Component Analysis tutorial. Without delving deep into mathematics, I’ve tried to familiarize you with the most important concepts required to use this technique. It’s simple but needs special attention when deciding the number of components. Practically, we should strive to retain only the first few k components. The idea behind PCA is to construct some principal components (Z << Xp) that satisfactorily explain most of the data’s variability and relationship with the response variable.

This brings me to the end of this Principal Component Analysis tutorial. Without delving deep into mathematics, I’ve tried to make you familiar with the most important concepts required to use this technique. It’s simple but needs special attention when deciding the number of components. Practically, we should strive to retain only the first few k components. The idea behind PCA is to construct some principal components (Z << Xp) which satisfactorily explain most of the data’s variability and relationship with the response variable.

Key Takeaways

Principal Component Analysis (PCA) is used to overcome feature redundancy in a data set. These features are low dimensional in nature. These features, a.k.a components, are a result of normalized linear combinations of original predictor variables.
The first component has the highest variance, followed by second, third, and so on. The components must be uncorrelated (remember orthogonal direction ?). See above.
Normalizing data becomes extremely important when the predictors are measured in different units.PCA works best on data sets having 3 or higher dimensions. Because, with higher dimensions, it becomes increasingly difficult to make interpretations from the resultant data cloud.

Himanshi Singh

I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together.

Thanks for stopping by my profile - hope you found something you liked :)

Algorithm Data Science Intermediate Python Python R Statistics Structured Data

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Tuhin Chattopadhyay

Excellent Manish

Show 1 reply

Analytics Vidhya Content Team

Your appreciation means a lot. Thank you Tuhin Sir :)

Surobhi

Hi Manish, Information given about PCA in your article was very comprehensive as you have covered both the theoretical and the implementation part very well. It was fun and simple to understand too. Can you please write a similar one for Factor Analysis? How is it different from PCA and how to decide on the method of dimensional reduction case to case. Thanks

Thanks Surobhi ! I already have it in my plan to write soon one detailed post on Factor Analysis. Wish me luck!

Prasoon Saxena

This is good explanation Manish and thank you for sharing it. Quick question, model created using these 30pca will have all 50 independent variable but if I want to figure out what among those 50 independent variables which are most critical one then how we figure that so that we can build model using those specific variables. Will appreciate your help. Thanks

Hello For model building, we'll use the resultant 30 components as independent variables. Remember, each component is a vector comprising of principal component score derived from each predictor variable (in this case we have 50). Check prin_comp$rotation for principal component scores in each vector. This technique is used to shrink the dimension of a data set such that it becomes easier to analyze, visualize and interpret. By 'critical', I assume you are talking about measuring variable importance. If that's the case, you can look for p values, t statistics in regression. For variable selection, regression is equipped with various approaches such as forward selection, backward selection, step wise selection etc.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Principal Component Analysis

Table of contents

What is Principal Component Analysis?

PCA Example

PCA vs LDA vs Factor Analysis

What are Principal Components?

First Principal Component

Second Principal Component (Z²)

How Principal Component Analysis (PCA) Work ?

Principal Component Analysis (PCA) Examples

Why is Normalization of Variables Necessary in Principal Component Analysis (PCA)?

Implement PCA in R & Python (With Interpretation)

Data Loading and Cleaning

Removing the Dependent and Other Identifier Variables

Hot Encoding

Divide Data in Test and Train

The prcomp() Function Results in 5 Useful Measures

1. center and scale

2. Roatation

3. Principal Component Score

4. prcomp()

5. Sum total of Variance

Screen Plot

Predictive Modeling With PCA Components

So, what should we do?

For Python Users:

Conclusion

Key Takeaways

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

Second Principal Component (`Z²`)