Understand Machine Learning Easily Using Python Shapash Library

Akshay Last Updated : 28 Apr, 2021

7 min read

This article was published as a part of the Data Science Blogathon.

Topics to be covered

What is Shapash library
The objective of Shapash library
Features of Shapash library
How does Shapash Work
Installation
Getting Started
Conclusion

What is Shapash library?

Model interpretability and intelligibility have been the point of convergence of many research papers and open source commitments. Be that as it may, a large portion of these is equipped with data experts and trained professionals. Shapash is a Python library to imagines AI models’ dynamic interaction. It expects to make AI models reliable for everybody by making them more straightforward and straightforward. Shapash makes straightforward visualizations of global and local reasonableness.

It additionally works with making a web application that can give a great deal of significant worth to end clients and entrepreneurs. Shapash is viable with most sklearn, lightgbm, xgboost, catboost models and can be utilized for classification and regression tasks. It utilizes a Shap backend to figure the local commitment of features, however, this can be supplanted with some other strategy for computing local commitments. Data scientists can utilize the Shapash explainer for investigating and troubleshooting their models or deploy to furnish visualizations with each surmising.

Objective of Shapash

1. To show clear and reasonable outcomes: Plots and outputs utilize labels for each component and its modalities:

2. To permit Data Scientists to rapidly comprehend their models by utilizing a web app to effortlessly explore among worldwide and neighborhood logic, and see how the various highlights contribute:

3. To Summarize and export the nearby explanation: Shapash proposes a short and clear nearby explanation. It permits every client, whatever their Data background is, to comprehend a nearby expectation of a managed model, because of a summed up and clear clarification of Shapash Features.

4. Full Data science Report

Check the report example here

Shapash Features

Some of the features of Shapash are shown below:

Machine Learning models: It works with both classification (Binary or Multiclass problems) and Regression problems. It supports many models like Catboost, Xgboost, LightGBM, Sklearn Ensemble, Linear models, and SVM.
Feature Encoding: It supports a large number of encoding techniques to handle categorical features in our dataset like One Hot Encoding, Ordinal Encoding, Base N Encoding, Target Encoding, or Binary Encoding, etc.
SklearnColumnTransformer: OneHotEncoder, OrdinalEncoder, StandardScaler, QuantileTransformer or PowerTransformer
Visualizations: Provides a set of visuals to easily interpret your results. Display understandable and clear results.
It is compatible with Lime and Shap. It uses Shap backend to show results in just a few lines of code.
It provides a lot of options for parameters to get your results concisely.
Shapash is not difficult to install and utilize: It gives a SmartExplainer class to comprehend your model and sum up clarification with straightforward syntax.
Deployment: Important for investigation and deployment(through an API or in Batch mode) for operational use. Easy web app creation to navigate from global to local.
High versatility: Very couple of contentions are needed to show results. However, the more you work on cleaning and archiving the data, the more clear the outcomes will be for the end client.

How does Shapash Work

Shapash is a package that makes machine learning understandable and interpretable. Data Enthusiasts can understand their models easily and at the same time can share them. Shapash uses Lime and Shap as a backend to show results in just a few lines of code. Shapash depends on the various advances important to build a Machine Learning model to make the outcomes reasonable. The below image shows the working of the shapash package:

It works on the following principle:

First, it compiles the elements of each step like Data Preparation, Feature Engineering, Model Fitting, model Evaluation and Model Understanding
Secondly, It provides a WebApp and charts to understand the model better. We can share and discuss our results with clients.
Lastly, It provides you with a summary of explainability.

Installation

Shapash can be installed using the below code:

pip install shapash

For Jupyter notebook: If you are using jupyter notebook and you want to see inline graphs, then you need to use one more command other than the above command:

pip install ipywidgets

Getting Started

In this section, we are going to explore Shapash by using the dataset House Prices Prediction. It is a regression problem where we have to predict house prices. The link for the dataset is here. First, let’s analyse the dataset including univariate and bivariate analysis then model explainability using Feature Importance, Feature Contribution, Local and Compare plots after that Model performance and finally WebApp.

Dataset Analysis

Univariate Analysis

Use can see the below image for a feature named First Floor Square Feet. We can see a table that shows broad statistics of our train and test dataset like mean, max, min, standard deviation, median, and many more. On the right-hand side, we can see distribution graphs for both training and test datasets. Shapash also mentions whether our feature is categorical or numerical and it also gives you the option of the dropdown. In the drop-down, all the features are available.

For categorical features, distinct values and missing values are shown for the training and test dataset. On the right-hand side, a bar plot is shown with the respective percentage of category in a feature.

Target Analysis

We can also see the detailed analysis of our target variable called Sales Price. On the left-hand side, all the statistics are shown like count, mean, standard deviation, min, max, median, and many more for both training and prediction datasets. On the right-hand side, a distribution is shown for both training and prediction datasets.

Multivariate Analysis

In the previous section, we had a detailed discussion on univariate analysis. In this section, we are going to see Multivariate Analysis. The below image shows the Correlation Matrix of the top 20 features for both training and testing datasets. A correlation scale is also shown based on different colors. That’s how we can visualize the relation between features using Shapash.

Model Explainability

Feature Importance Plot

By using this package, we can see the feature’s importance. Feature importance is a way to find the importance of input features in predicting the output value. The below image shows the feature importance curve

Feature Contribution Plot

These curves help us to answer the questions like how does a feature impacts my prediction, does it contribute positively or negatively, etc. This plot finishes the significance of the highlights for the interpretability, the global coherence of the model to all the more likely comprehend the impact of a feature on a model.

We can see the contribution plots for both numerical and categorical features.

For numerical feature:

For categorical feature:

Local Plot

We can draw Local plots. The Below image shows the local plot:

Compare Plot

We can draw Compare plots. The below image shows the compare plot:

Model Performance

After Data Analysis, we are training the Machine learning model. The below image shows the output of our prediction. On the left-hand side, statistics like count, min, max, median, standard deviation etc are shown for true values and predicted values. On the right-hand side, a distribution is shown for both predicted and actual values.

Web App

After a model training, we can build a WebApp as well. This web app shows a complete dashboard of our data including the things which we have covered till now. The below image shows the dashboard.

For more information, check this Link

Conclusion

In this blog, we have studied Shapash in a detailed discussion. You can try this library to automate machine learning tasks and save your time.

You can check my articles here: Articles

Thanks for reading this article and for your patience. Do let me in the comment section. Share this article, it will give me the motivation to write more blogs for the data science community.

Email id: gakshay1210@gmail.com

Follow me on LinkedIn: LinkedIn

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Akshay

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Understand Machine Learning Easily Using Python Shapash Library

Topics to be covered

What is Shapash library?

Objective of Shapash

Shapash Features

How does Shapash Work

Installation

Getting Started

Dataset Analysis

Model Explainability

Model Performance

Web App

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)