In this article, we will be closely working with the heart disease prediction using Machine Learning and for that, we will be looking into the heart disease dataset from that dataset we will derive various insights that help us know the weightage of each feature and how they are interrelated to each other but this time our sole aim is to detect the probability of person that will be affected by a savior heart problem or not.
This article was published as a part of the Data Science Blogathon.
Heart disease prediction using machine learning involves analyzing medical information like age, blood pressure, and cholesterol levels to forecast the likelihood of someone having heart issues. By training computer models with this data, we can create systems that help identify individuals at risk of heart disease, aiding in prevention and early intervention.
Using machine learning for disease prediction involves teaching computers to study lots of medical information to guess if someone might get sick. For example, with heart disease prediction using machine learning, computers can look at factors like age, blood pressure, and cholesterol levels to guess who might have heart problems in the future. This helps doctors catch issues early and keep people healthy.
Plotting Libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import cufflinks as cf %matplotlib inline
Metrics for Classification technique from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
Scaler from sklearn.preprocessing import StandardScaler from sklearn.model_selection import RandomizedSearchCV, train_test_split
Model buildingfrom xgboost import XGBClassifier from catboost import CatBoostClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC
Here we will be using the pandas read_csv function to read the dataset. Specify the location of the dataset and import them.
Importing Datadata = pd.read_csv(“heart.csv”) data.head(6) # Mention no of rows to be displayed from the top in the argument
Now, let’s see the size of the datasetdata.shape
Output:(303, 14)
Inference: We have a dataset with 303 rows which indicates a smaller set of data.
As above we saw the size of our dataset now let’s see the type of each feature that our dataset holds.
Python Code:
Inference: The inference we can derive from the above output is:
As we are getting some information from each feature so let’s see how statistically the dataset is
It is always better to check the correlation between the features so that we can analyze that which feature is negatively correlated and which is positively correlated so, Let’s check the correlation between various features.plt.figure(figsize=(20,12)) sns.set_context(‘notebook’,font_scale = 1.3) sns.heatmap(data.corr(),annot=True,linewidth =2) plt.tight_layout()
By far we have checked the correlation between the features but it is also a good practice to check the correlation of the target variable.
So, let’s do this!sns.set_context(‘notebook’,font_scale = 2.3) data.drop(‘target’, axis=1).corrwith(’bar’, grid=True, figsize=(20, 10), title=”Correlation with the target feature”) plt.tight_layout()
Inference: Insights from the above graph are:
So, we have done enough collective analysis now let’s go for the analysis of the individual features which comprises both univariate and bivariate analysis.
Here we will be checking the 10 ages and their counts.plt.figure(figsize=(25,12)) sns.set_context(‘notebook’,font_scale = 1.5) sns.barplot(x=data.age.value_counts()[:10].index,y=data.age.value_counts()[:10].values) plt.tight_layout()
Inference: Here we can see that the 58 age column has the highest frequency.
Let’s check the range of age in the dataset.minAge=min(data.age) maxAge=max(data.age) meanAge=data.age.mean() print(‘Min Age :’,minAge) print(‘Max Age :’,maxAge) print(‘Mean Age :’,meanAge)
Min Age : 29 Max Age : 77 Mean Age : 54.366336633663366
We should divide the Age feature into three parts – “Young”, “Middle” and “Elder”Young = data[(data.age>=29)&(data.age<40)] Middle = data[(data.age>=40)&(data.age<55)] older = data[(data.age>55)] plt.figure(figsize=(23,10)) sns.set_context(‘notebook’,font_scale = 1.5) sns.barplot(x=[‘young ages’,’middle ages’,’older ages’],y=[len(Young),len(Middle),len(older)]) plt.tight_layout()
Inference: Here we can see that elder people are the most affected by heart disease and young ones are the least affected.
To prove the above inference we will plot the pie chart.colors = [‘blue’,’green’,’yellow’] explode = [0,0,0.1] plt.figure(figsize=(10,10)) sns.set_context(‘notebook’,font_scale = 1.2) plt.pie([len(Young),len(Middle),len(older)],labels=[‘young ages’,’middle ages’,’older ages’],explode=explode,colors=colors, autopct=’%1.1f%%’) plt.tight_layout()
plt.figure(figsize=(18,9)) sns.set_context(‘notebook’,font_scale = 1.5) sns.countplot(data[‘sex’]) plt.tight_layout()
Inference: Here it is clearly visible that, Ratio of Male to Female is approx 2:1.
Now let’s plot the relation between sex and slope.plt.figure(figsize=(18,9)) sns.set_context(‘notebook’,font_scale = 1.5) sns.countplot(data[‘sex’],hue=data[“slope”]) plt.tight_layout()
Inference: Here it is clearly visible that the slope value is higher in the case of males(1).
plt.figure(figsize=(18,9)) sns.set_context(‘notebook’,font_scale = 1.5) sns.countplot(data[‘cp’]) plt.tight_layout()
Inference: As seen, there are 4 types of chest pain
Analyzing cp vs target column
Inference: From the above graph we can make some inferences,
Older people are more likely to experience chest pain.
plt.figure(figsize=(18,9)) sns.set_context(‘notebook’,font_scale = 1.5) sns.countplot(data[‘thal’]) plt.tight_layout()
plt.figure(figsize=(18,9)) sns.set_context(‘notebook’,font_scale = 1.5) sns.countplot(data[‘target’]) plt.tight_layout()
Inference: The ratio between 1 and 0 is much less than 1.5 which indicates that the target feature is not imbalanced. So for a balanced dataset, we can use accuracy_score as evaluation metrics for our model.
Now we will see the complete description of the continuous data as well as the categorical datacategorical_val = [] continous_val = [] for column in data.columns: print(“——————–“) print(f”{column} : {data[column].unique()}”) if len(data[column].unique()) <= 10: categorical_val.append(column) else: continous_val.append(column)
Now here first we will be removing the target column from our set of features then we will categorize all the categorical variables using the get dummies method which will create a separate column for each category suppose X variable contains 2 types of unique values then it will create 2 different columns for the X variable.categorical_val.remove(‘target’) dfs = pd.get_dummies(data, columns = categorical_val) dfs.head(6)
Now we will be using the standard scaler method to scale down the data so that it won’t raise the outliers also dataset which is scaled to general units leads to having better = StandardScaler() col_to_scale = [‘age’, ‘trestbps’, ‘chol’, ‘thalach’, ‘oldpeak’] dfs[col_to_scale] = sc.fit_transform(dfs[col_to_scale]) dfs.head(6)
Splitting our DatasetX = dfs.drop(‘target’, axis=1) y = X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors = 10),y_train) y_pred1 = knn.predict(X_test) print(accuracy_score(y_test,y_pred1))
Here's the repo link to this article.
The Heart Disease prediction will have the following key takeaways:
The best algorithm for heart disease prediction using machine learning is logistic regression, decision trees, and random forests, but popular ones also include logistic regression, decision trees, and random forests.
A standard machine learning model to detect heart attacks is the convolutional neural network (CNN), which analyzes medical images like electrocardiograms (ECGs) to identify signs of a heart attack.
Heart attacks can be detected using AI by training machine learning models with medical data, such as ECGs, patient history, and vital signs, to recognize patterns indicative of a heart attack.
Yes, machines can detect heart attacks using advanced algorithms and medical data, making early detection possible and aiding in timely medical intervention.
