CMAPSS Jet Engine Failure Classification Based On Sensor Data

Adi Zaenul Mustaqim Last Updated : 22 Jul, 2024

17 min read

Introduction

In a future where jet engines are able to anticipate their own failures before they occur, millions of dollars and possibly lives could be saved. This research uses NASA jet engine simulation data to explore a novel method to predictive maintenance. We explore how machine learning can assess the condition of these vital components by analyzing sensor data from jet engines, which records variables such as temperature and pressure. This study demonstrates the potential of artificial intelligence (AI) to revolutionize engine maintenance and improve safety by going through the steps of data preparation, feature selection, and the use of sophisticated algorithms like Random Forest and Neural Networks. Come along as we explore the complexities of predictive modeling and data processing to anticipate engine failures before they happen.

Learning Outcomes

Learn how AI and machine learning can forecast equipment failures before they occur.
Gain skills in preparing and processing complex sensor data for analysis.
Get hands-on experience with algorithms like Random Forest and Neural Networks for predictive modeling.
Discover how to select and engineer features to improve model accuracy.
Learn how predictive maintenance can lead to significant improvements in safety and operational efficiency.

This article was published as a part of the Data Science Blogathon.

Overview of Dataset
Business Understanding
Data Understanding
Data Preparation
Modeling & Evaluation
Frequently Asked Questions

Overview of Dataset

The United States space agency or popularly known as NASA some time ago shared a dataset containing jet engine simulation data. This data includes sensor readings from a jet engine, covering its operation from initial use until failure. It is certainly interesting to discuss how we can recognize sensor data patterns and then perform classification to determine whether a jet engine is still functioning normally or failed. This project will explore how machine learning models analyze sensor data to predict engine health. This project follows the CRISP-DM concept, a workflow that organizes the data mining process. For more details, let’s take a look together!

CMAPSS Jet Engine Failure Classification Based On Sensor Data

Business Understanding

This stage will explain the project’s background, define the problems faced, and outline the ultimate goal of the jet engine predictive maintenance project to address the defined issues.

Why is machine failure prediction important?

Jet engines play a crucial role in NASA’s space industry, serving as the power source for vehicles like airplanes by generating thrust. Given their importance, we need to analyze and predict the engine’s health to determine whether it is functioning normally or requires maintenance. This aims to avoid engine failure suddenly that could potentially endanger the vehicle. One way to measure engine performance is by using sensors. These sensors work to find out various things such as temperature, rotation, pressure, vibration in the engine, and others. Therefore, this project will carry out an analysis process to predict engine health based on sensor data before the engine actually fails.

What’s the problem?

Ignorance of machine health can potentially lead to sudden machine failure during use.

What’s the objective?

Classify machine health into normal or failure categories based on sensor data.

Data Understanding

This stage is the process of recognizing the data. This process will call the data and display the initial dataset before further processing.

Dataset Information

The dataset that will be used in this project comes from CMAPSS Jet Engine Simulated Data. This dataset consists of several files which are broadly grouped into 3 category: train, test, and RUL. However, this project will only use train data. There is train_FD001.txt. This dataset has 26 columns and 20,631 data.

Feature Explanation

Parameters	Symbol	Description	Unit
Engine	–	–	–
Cycle	–	–	t
Setting 1	–	Altitude	ft
Setting 2	–	Mach Number	M
Setting 3	–	Sea-level Temperature	°F
Sensor 1	T2	Total temperature at fan inlet	°R
Sensor 2	T24	Total temperature at LPC outlet	°R
Sensor 3	T30	Total temperature at HPC outlet	°R
Sensor 4	T50	Total temperature at LPT outlet	°R
Sensor 5	P2	Pressure at fan inlet	psia
Sensor 6	P15	Total pressure in bypass-duct	psia
Sensor 7	P30	Total pressure at HPC outlet	psia
Sensor 8	Nf	Physical fan speed	rpm
Sensor 9	Nc	Physical core speed	rpm
Sensor 10	epr	Engine pressure ratio	–
Sensor 11	Ps30	Static pressure at HPC outlet	psia
Sensor 12	phi	Ratio of fule flow to Ps30	pps/psi
Sensor 13	NRf	Corrected fan speed	rpm
Sensor 14	NRe	Corrected core speed	rpm
Sensor 15	BPR	Bypass ratio	–
Sensor 16	farB	Burner fuel-air ratio	–
Sensor 17	htBleed	Bleed enthalpy	–
Sensor 18	Nf_dmd	Demanded fan speed	rpm
Sensor 19	PCNfR_dmd	Demanded corrected fan speed	rpm
Sensor 20	W31	HPT coolant bleed	lbm/s
Sensor 21	W32	LPT coolant bleed	lbm/s

Notes:

LPC/HPS = Low/High Pressure Compressor
LPT/HPT = Low/High Pressure Turbine

View Raw Data

We can check the dimensions and view raw data before processing it further.

import pandas as pd

# Read dataset files and convert to dataframes
data = pd.read_csv("/content/train_FD001.txt", sep=" ", header=None)

# Show dataset dimension
print("Shape of data :", data.shape)

# Show initial data
data

Notes:

/content/train_FD001.txt is the location and filenames of the dataset. Specify the location of the file on your computer.
data.shape returns 2 values. (The number of data, the number of columns)

From the dataset, you can see that the column names are not representative (still in the form of numbers) and there are columns that contain NaN (Not a Number) values in the last 2 columns. You need to further clean the data. Perform this cleaning process during the data preparation stage.

Data Preparation

This stage cleans the data, producing a clean dataset ready for the Machine Learning modeling process. There is a term Garbage In, Garbage Out (GIGO) which means that if the data trained is garbage data, it will create a garbage model too. A model that is not good for the prediction process. To avoid this, a data preparation process is needed. Some of the processes carried out at this stage include:

Handling NaN value & rename the column name

Remove NaN values from the dataset because they do not influence the data. In addition, it is also important to rename the columns to make them easier to read and more representative.

# Remove NaN values from the last 2 columns of the dataset
data.drop(columns=[26, 27], inplace=True)

# List the column names according to the dataset description
columns = [
    'engine', 'cycle', 'setting1', 'setting2', 'setting3', 'sensor1',
    'sensor2', 'sensor3', 'sensor4', 'sensor5', 'sensor6', 'sensor7',
    'sensor8', 'sensor9', 'sensor10', 'sensor11', 'sensor12', 'sensor13',
    'sensor14', 'sensor15', 'sensor16', 'sensor17', 'sensor18', 'sensor19',
    'sensor20', 'sensor21'
]

# Rename a column in the dataset
data.columns = columns

Naming the dataset after the column descriptions makes it easier to understand the meaning of the predictors. So, there are now only 26 columns (predictors) in the dataset.

View dataset statistics

This process determines statistical details from the data, such as the average value, standard deviation, minimum value, Q1, median, Q2, and maximum value for each column.

# Melihat statistik dari dataset
data.describe().transpose()

The data reveals that several predictors have identical min and max values. This indicates that the predictor has a constant value, which is the same value for all rows. This will not affect the target so it is necessary to remove these predictors to reduce the computational time.

Removing constant-value columns

A constant value is characterized by identical min and max values. Here is the function to remove the constant value.

def drop_constant_value(dataframe):
    '''
    Function:
        - Deletes constant value columns in the data set.
        - A constant value is a value that is the same for all data in the data set.
        - A value is considered constant if the minimum (min) and maximum (max) values in the column are the same.
    Args:
        dataframe -> dataset to validate
    Returned value:
        dataframe -> dataset cleared of constant values
    '''

    # Creating a temporary variable to store a column name with a constant value
    constant_column = []

    # The process of finding a constant value by looking at the minimum and maximum values
    for col in dataframe.columns:
        min = dataframe[col].min()
        max = dataframe[col].max()

        # Append the column name if the min and max values are equal.
        if min == max:
            constant_column.append(col)

    # Delete column with constant value
    dataframe.drop(columns=constant_column, inplace=True)

    # return data
    return dataframe

# call function to drop constant value        
data = drop_constant_value(data)
data

After the constant value removal process, the dataset left 19 predictors from the original 26 predictors. This shows that there are 7 predictors that have constant values

Creating a Label for the Prediction Target

Since this is a classification task and the dataset doesn’t have a target column, it is necessary to create a target column manually. We will create a target that classifies the machine as either normal or failed (binary classification). In this project, we will label normal status as 0 and failure as 1.

We use a threshold value of 20 to determine whether a cycle is labeled as failure or normal. This value is subjective, and we chose 20 to anticipate a complete engine failure (20 cycles remaining). This allows technicians to inspect the engine earlier and prepare for a replacement. This is useful to anticipate sudden engine failure during use. That is, for each engine if the cycle value has reached (maximum cycle – threshold), then the cycle will be labeled as failure. For example, engine 1 has a maximum cycle of 120. Then cycle 101 to 120 will be labeled as failure. Here is the function to create a machine status label.

def assign_label(data, threshold):
    '''
    Function:
        - Labeling a dataset
    Args:
        - data -> dataset to be labeled
        - threshold -> threshold value of cycle before failure
    Return:
        - data -> labeled dataset
    '''

    for i in range(1, 101):
        # Get max cycle each engine
        max_cycle = data.loc[(data['engine'] == i), 'cycle'].max()

        # Determine when cycle is labeled as failure
        start_warning = max_cycle - threshold

        # Assign label 1 to dataset
        data.loc[(data['engine'] == i) & (data['cycle'] > start_warning), 'status'] = 1

    # Assign label 0 to dataset
    data['status'].fillna(0, inplace=True)

    # Return labeled dataset
    return data
    
    
# Determine the threshold value    
threshold = 20

# Call assign_label function to get label
data = assign_label(data, threshold)

# Show data after labelling
data

View feature correlation with heatmap

The influence value or known as the correlation value in the dataset can be divided into 5 categories, namely:

We will use a heatmap visualization to see the correlation value between the predictor and the target, with a threshold value of 0.20 in this project.


# Heatmap for checking the correlation
threshold = 0.2
plt.figure(figsize=(12, 10))
sns.set(font_scale=0.7)
sns.set_style("whitegrid", {"axes.facecolor": ".0"})

cluster = data.corr()
mask = cluster.where((abs(cluster) >= threshold)).isna()
plot_kws={"s": 1}
sns.heatmap(cluster,
            cmap='RdYlBu',
            annot=True,
            mask=mask,
            linewidths=0.2,
            linecolor='lightgrey').set_facecolor('white')
plt.title("Feature Correlation using Heatmap")
# Heatmap for checking the correlation
threshold = 0.2
plt.figure(figsize=(12, 10))
sns.set(font_scale=0.7)
sns.set_style("whitegrid", {"axes.facecolor": ".0"})

cluster = data.corr()
mask = cluster.where((abs(cluster) >= threshold)).isna()
plot_kws={"s": 1}
sns.heatmap(cluster,
            cmap='RdYlBu',
            annot=True,
            mask=mask,
            linewidths=0.2,
            linecolor='lightgrey').set_facecolor('white')
plt.title("Feature Correlation using Heatmap")

The heatmap visualization will display only predictors with an absolute correlation value greater than or equal to the threshold. We use a threshold value of 0.2 because a correlation above 0.2 indicates a fairly strong relationship, while a correlation below 0.2 is too weak to be useful.

A negative value in the correlation indicates that the predictor has an opposite correlation with other predictors. For example, sensor 2 and sensor 7 have a correlation value of -0.7. This means that when the value of sensor 2 increases, the value of sensor 7 will decrease and vice versa. The higher the correlation value, the more they affect each other. The absolute value of the correlation value is between 0 and 1. A value of 0 means no correlation while 1 means a very strong correlation.

Feature selection

In some cases, not all predictors (columns) in the dataset have a strong enough influence on the target. For this reason, it is necessary to perform a feature selection process to remove features that have no influence. The goal is to reduce the time and computational burden used in the learning process. As in the previous stage, a threshold value of 0.2 will be used. So that predictors that have a correlation value < 0.2 will be removed. Here is the function for feature selection.

# Show predictor that have correlation value >= threshold
correlation = data.corr()
relevant_features = correlation[abs(correlation['status']) >= threshold]
relevant_features['status']

# Keep a relevant features (correlation value >= threshold)
list_relevant_features = list(relevant_features.index[1:])

# Applying feature selection
data = data[list_relevant_features]

After the feature selection process, we are left with 15 columns consisting of 14 predictors and 1 target.

View the proportion of classes in the dataset

The next step is to look at the proportion of classes in the dataset. We will look at the proportion of normal (0) and failure (1) classes. This is done to determine the balance of the dataset.

The visualization above shows that the dataset contains 18,631 cycles classified as normal and 2,000 cycles classified as failure. This means that the proportion of minority values is 9.7% of the total dataset. Since this proportion falls into the moderate category, it is necessary to perform a sampling process to increase the number of minority data points. This phenomenon is referred to as an unbalanced dataset. The article about unbalanced datasets can be seen here.

Split the dataset into training and test data

Before balancing the data (sampling process), first divide it into two parts: train data and test data. Use the train data to build machine learning models and the test data to evaluate the performance of the resulting models.

In this project, we will use an 80:20 scheme for data sharing, meaning we will use 80% of the data as training data and 20% as test data. We chose this scheme without a specific rule. Some projects use 60:40, 70:30, 75:25, 80:20, and 90:10 schemes. But one thing for sure is that the amount of test data should not exceed the train data. Additionally, we will divide the data into predictor columns (prefix X) and target columns (prefix y).

Split the dataset into training and test data

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

# Determine predictor (X) and target (y)
X = data.iloc[:,:-1]
y = data.iloc[:,-1:]

# Split dataset into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Change y_train into 1 dimension form
y_train = y_train.squeeze()

After the dataset is divided, we look at the number of train data and test data by using the shape function.

# Check dimension of data train and test
print("Shape of train : ", X_train.shape)
print("Shape of test  : ", X_test.shape)

Out of the total 20,631 data points in the dataset, we will use 16,504 for training and 4,127 for testing. The number 14 signifies the 14 predictors that will be analyzed for patterns during the learning process.

Sampling Dataset using SMOTE

The sampling process is used to overcome the problem of unbalanced datasets. The purpose of this process is to balance the proportion of classes in the dataset so that the normal and failure classes will have the same amount of data. This will make the machine learning model sensitive to both classes of data (normal and failure) not just to one of them.

To prevent data leakage from the test data, you should perform the sampling process only on the train data. Therefore, in the previous stage, we first divided the data into training and testing sets.

In this project, we will use the oversampling technique to generate synthetic data for the minority class (failure) to match the number of samples in the majority class (normal). The algorithm used is Synthetic Minority Oversampling Technique (SMOTE). Read more about SMOTE at the following link.

from imblearn.over_sampling import SMOTE

# Oversmapling process to overcome imbalanced dataset
smote = SMOTE(random_state=42)
X_train, y_train = smote.fit_resample(X_train, y_train)

# Class proportion checking
data = X_train
data['status'] = y_train

sns.countplot(x='status', data=data)
plt.title("Class proportion after sampling")
plt.xlabel('Status Mesin')
plt.ylabel('Jumlah Data')
print("0: ", len(data[data['status'] == 0]), " data")
print("1: ", len(data[data['status'] == 1]), " data")

The barplot above shows that after the oversampling process, the data for normal and failure machines is balanced, with each status having 14,861 data points.

Scaling Value using Z-Score

Just like the sampling process, we should perform the scaling process only on the train data to prevent data leakage from the test data. Additionally, we must scale the data after sampling, not before. Therefore, we first divide the data into train and test sets, then perform sampling, and finally apply scaling.

The scaling process is used to equalize the range of values of all features. This aims to reduce the computational burden during the training process and improve the performance of the resulting model. The scaling process is carried out if there is a predictor that has a value far above the value of other predictors.

In this project, the Z-Score method will be used for the scaling process. More information about Z-Score normalization can be found at the following link.

# Change X_train to dataframe
X_train = pd.DataFrame(X_train, columns = X.columns)

# Scaling process
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Show data after scaling process
X_train_scaling = pd.DataFrame(X_train, columns = X.columns)
X_train_scaling

From the scaling results, it can be seen that all predictors have a range of data that is not much different. This will facilitate the process of building machine learning models and reduce the time and computational resources required.

Modeling & Evaluation

This stage is a process of creating a machine learning model that will later be used for the prediction process. Some of the things done in this phase are:

Selection of the machine learning algorithm to be used and hyperparameter tuning.
Fitting process or model learning process.
Model evaluation process to determine the performance of the model.

This stage produces a trained model that is ready for the prediction process.

Random Forest (RF) Model

Random forest is a popular classification algorithm due to its excellent performance. This article does not discuss the details of random forest so you can read more about random forest in the following sources.

After the data is cleaned in the pre-processing process, the next step is to build a machine learning model. To create an ML model from random forest, we will use the library provided by scikit-learn.

# Creating object from RandomForestClassifier() class
model = RandomForestClassifier()

# Training process
model = model.fit(X_train, y_train)

# Predicting test data
y_predict = model.predict(X_test)

Notes

The RandomForestClassifier() function from the scikit-learn library creates machine learning models using the random forest algorithm.
The fit() function is used for the training and machine learning process tocreate the ML model. The fit() function requires 2 data namely X_train and y_train. X_train is data that contains predictor data while y_train contains target data.
The predict() function is used to predict new data. This function requires one data, X_test, which is the predictor data for the testing data. This function produces the target prediction of X_test, which is then stored in the y_predict variable.

After successfully predicting the data using the predict() function, then we will evaluate the prediction results to find out whether the resulting model is good or not. To evaluate, we will use several measures: accuracy, precision, recall, and F1 score. First, we will use the confusion matrix to determine the values of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) before calculating these evaluation metrics. More information about confusion matrix can be seen in the following link.

# Visualize confusion matrix table
matrix = metrics.confusion_matrix(y_test, y_predict)
matrix_display = metrics.ConfusionMatrixDisplay(confusion_matrix = matrix, display_labels = ["normal", "failure"])
matrix_display.plot()
plt.grid(False)
plt.show()

Explanation

The confusion matrix table above reveals the following:

True Positive (TP): Cycle failure that is correctly predicted failure. There are 336 data.
True Negative (TN): Cycle normal that is correctly predicted to be normal. There are 3,657 data.
False Positive (FP): Cycle normal predicted failure. There are 113 data.
False Negative (FN): Cycle failure that is predicted to be normal. There are 21 data.

print("Accuracy  : ", metrics.accuracy_score(y_test, y_predict))
print("Precision : ", metrics.precision_score(y_test, y_predict))
print("Recall    : ", metrics.recall_score(y_test, y_predict))
print("F1 Score  : ", metrics.f1_score(y_test, y_predict))

From the evaluation scores above, we can conclude as follows:

The accuracy value shows that the model is able to predict 96% of the data correctly. In other words, out of 4,127 test data the model can correctly predict 3,989 data.
The precision value shows that of all the cycles predicted to fail by the model, only 74% are correct. In other words, of the 449 cycles predicted to fail, only 336 cycles were actually in failure status. The rest are normal.
The recall value shows that the model successfully predicted 94% of the cycles with failure status as failures. In other words, out of 357 cycles that were indeed failures, the model was able to correctly predict 337 cycles. Only 20 cycles with failure status were predicted normally by the model.
The F1 value shows that the model is able to recognize normal and failure cycle conditions well. Not leaning towards one condition only.

Building Artificial Neural Network (ANN) Model

ANN is one of the machine learning algorithms that is the forerunner of deep learning algorithms. It is called neural because it mimics how neurons in the human brain transfer signals to other neurons. Further discussion about ANN can be seen in the following article.

In this project, the Tensorflow library will be used to build the ANN model. Here is the code to build the ANN architecture.

# Import library to build neural network architecture
from keras.layers import Dense, LeakyReLU
from keras.models import Sequential

# Import library for optimization
from keras.optimizers import Adam

# Import library to prevent overfitting
from keras.callbacks import EarlyStopping
from keras.regularizers import l2

# Build neural network architecture
model = Sequential()
model.add(Dense(512, input_dim=X_train.shape[1], activation = LeakyReLU(), kernel_regularizer=l2(0.01)))
model.add(Dense(256, activation = LeakyReLU(), kernel_regularizer=l2(0.01)))
model.add(Dense(128, activation = LeakyReLU(), kernel_regularizer=l2(0.01)))
model.add(Dense(1, activation = 'sigmoid'))

opt = Adam(learning_rate = 0.0001) # optimizer
model.compile(optimizer = opt,
              loss = 'binary_crossentropy',
              metrics=['accuracy'])

# Create a object from EarlyStopping class
earlystopper = EarlyStopping(
    monitor = 'val_loss',
    min_delta = 0,
    patience = 5,
    verbose= 1)

# Fitting network
history = model.fit(
    X_train,
    y_train,
    epochs = 200,
    batch_size = 128,
    validation_split = 0.20,
    verbose = 1,
    callbacks = [earlystopper])

history_dict = history.history

Neural Network Algorithm Architecture

The Neural Network algorithm used has the following architecture:

Number of layers => 5 consisting of 1 input layer, 3 hidden layers, and 1output layer.
The input layer has 14 neurons. This number is adjusted to the number of predictors in the train data.
Hidden layers 1, 2, and 3 have 512, 256, and 128 neurons respectively.
The output layer has 1 neuron with a sigmoid activation function. This allows it to produce an output in the form of a fractional value between 0 and 1. In this project using a threshold of 0.5. If the output value >= 0.5 then failure and if < 0.5 then normal.
This architecture uses the ADAM optimizer function. This function is used to adjust the weight of each neuron in the learning process.
The loss function used is binary_crossentropy. This function calculates the error value in the output layer by measuring the difference between the actual data and the predicted data.
The evaluation metric measured during the machine learning process is the accuracy value.
This learning process uses the EarlyStopping() function to stop the learning process if the model does not improve for a certain time.

After completing the training process, we will evaluate the ANN model’s performance, similar to the approach used with Random Forest. The following is the confusion matrix code from ANN.

# Predicting test data
y_predict = (model.predict(X_test) > 0.5).astype('int32')

# Show confusion matrix table
matrix = metrics.confusion_matrix(y_test, y_predict)
matrix_display = metrics.ConfusionMatrixDisplay(confusion_matrix = matrix, display_labels = ["normal", "failure"])
matrix_display.plot()
plt.grid(False)
plt.show()

Evaluation Score Conclusion

From the evaluation scores above, we can conclude as follows:

The accuracy value shows that the model is able to predict 96% of the data correctly. In other words, out of 4,127 test data the model can correctly predict 3,992 data.
The precision value shows that of all the cycles predicted to fail by the model, only 75% are correct. In other words, of the 449 cycles predicted to fail, only 338 cycles were actually in failure status. The rest are normal.
The model successfully predicted 93% of the cycles that actually had failure status. In other words, out of 357 cycles that were indeed failures, the model was able to correctly predict 335 cycles. The model predicted only 22 cycles with failure status as normal.
The F1 value shows that the model is able to recognize normal and failure cycle conditions well. Not leaning towards one condition only.

Conclusion

This article underscores the transformative potential of machine learning in predictive maintenance for jet engines. By leveraging NASA’s comprehensive simulation data, we demonstrated how advanced algorithms like Random Forest and Neural Networks can effectively forecast engine failures, thus significantly enhancing operational safety and efficiency. The successful application of feature selection, data preparation, and sophisticated modeling techniques highlights the critical role of predictive analytics in preempting equipment failures. As we advance, these insights not only pave the way for more reliable engine maintenance strategies but also set a precedent for future innovations in predictive maintenance across various industries.

Get full code in Here at GitHub.

Key Takeaways

Sure, here are some key takeaways in one-liners:

Predictive maintenance can significantly enhance jet engine safety and efficiency.
Machine learning models like Random Forest and Neural Networks are effective in forecasting engine failures.
Feature selection and data preparation are crucial for accurate predictive maintenance.
NASA’s simulation data provides a robust foundation for predictive analytics in aviation.
Advancements in predictive maintenance set a precedent for innovations across industries.

Frequently Asked Questions

Q1. What is predictive maintenance for jet engines?

A. Predictive maintenance uses data and algorithms to forecast when jet engine components might fail, allowing for timely repairs and minimizing downtime.

Q2. Why is predictive maintenance important for jet engines?

A. It enhances safety, reduces unexpected failures, and lowers maintenance costs by addressing issues before they lead to significant problems.

Q3. What types of machine learning models are used in predictive maintenance?

A. Common models include Random Forest and Neural Networks, which analyze historical data to predict potential failures.

Q4. How does NASA contribute to predictive maintenance?

A. NASA provides simulation data that helps develop and refine predictive maintenance algorithms for jet engines.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Adi Zaenul Mustaqim

I’m a fresh graduate from the Master of Computer Science study program, Gadjah Mada University. Experience developing desktop-based applications using C#, web development using Laravel. Also experienced in developing machine learning models (data collection, data preparation, exploratory data analysis, data modeling, visualization and reporting, deployment).

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

CMAPSS Jet Engine Failure Classification Based On Sensor Data

Introduction

Learning Outcomes

Table of contents

Overview of Dataset

Business Understanding

Why is machine failure prediction important?

What’s the problem?

What’s the objective?

Data Understanding

Dataset Information

Feature Explanation

View Raw Data

Data Preparation

Handling NaN value & rename the column name

View dataset statistics

Removing constant-value columns

Creating a Label for the Prediction Target

View feature correlation with heatmap

Feature selection

View the proportion of classes in the dataset

Split the dataset into training and test data

Sampling Dataset using SMOTE

Scaling Value using Z-Score

Modeling & Evaluation

Random Forest (RF) Model

Notes

Explanation

Building Artificial Neural Network (ANN) Model

Neural Network Algorithm Architecture

Evaluation Score Conclusion

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#