A brief introduction to building interactive ML WebApps With Streamlit

Shivangi Last Updated : 21 May, 2021

8 min read

This article was published as a part of the Data Science Blogathon

Objectives

The article focuses on building beautiful and interactive ML web apps without any web development experience!
We will cover data loading, visualization, interaction, and generating dashboards in just a few lines of pure python code!
The machine learning interactive dashboards will provide users to choose any classification algorithm, select evaluation metrics, and switch hyper-parameters on the go in real-time, all without any coding experience.

Introduction

Don’t you wish to build awesome interactive web applications for your machine learning projects!! but web development, not your forte?

Streamlit is your new way out! Developing and deploying powerful machine learning web apps have never been so easy!

Streamlit is an open-source python library that allows swift custom web app building revolving data science, machine learning, and much more. Streamlit aims at building and deploying applications without the necessity of any web development knowledge.

Why choose Streamlit?

Streamlit library offers a set of unique features which sets it apart.

Pure python scripting without the need for HTML, CSS, etc.
Data caching features available
Fast and easy build and deployment
Supports several data science frameworks
Creates interactive applications without any complexity
Offers widgets as variables

In just a few minutes we can easily generate exquisite dashboards for our machine learning project — so let’s get started!

Learning Objectives

Use Streamlit and Python to build an interactive machine learning dashboard
Train multiple classifiers including Logistic Regression, Random Forest, and Support Vector Classifiers
Switch and Select hyperparameter settings for each classification algorithm
Plot evaluation metrics for the classifiers

Setting up the application

We can set up the application in a few easy steps.

Installation

Install the python Streamlit library

pip install streamlit

Importing

Import the required libraries for the machine learning application.

from numpy.core.numeric import True_
from sklearn import metrics
import streamlit as st
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_splitfrom sklearn.metrics import plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curve
from sklearn.metrics import precision_score, recall_score

Initialization

Let’s start very simple by creating a python script that prints text to the Web application.

def main():
    st.title(“Introduction to building Streamlit WebApp”)
    st.sidebar.title(“This is the sidebar”)
    st.sidebar.markdown(“Let’s start with binary classification!!”)
if __name__ == ‘__main__’:
    main()

We can initialize the application by defining the main method. We call streamlit by st and then specify the widget or function that needs to be added to the application.

For creating the main title we use st.title. Section titles can be added by st.header or st.subheader. Streamlit also supports Markdowns by st.markdown.

We can build a sidebar for adding interactive elements so that the users can make desired choices for hyperparameter tuning and model selection
The results would be reflected on the main page. Sidebar title can be added by st.sidebar.title.

Running the application

Switch to the terminal and run the application by the following command, this should automatically open up the Web browser. And this would be running on localhost with the default port 8501- http://localhost:8501

streamlit run app.py

Any further changes made to the code in the editor will be automatically updated after saving without needing to recompile.

Streamlit application

Data Loading

Now we will be loading the freely available dataset from the UCI machine learning repositories, We can use this dataset for binary classification. It consists of multiple columns, and each column is of a categorical data type so we need to perform some sort of encoding to convert these categorical features into numerical data.

For the data loading process, we define a function inside the main method to read the CSV file and store it as a pandas data frame. We make a function to iterate through each column in our data frame and fit transform at the data so it’s labelled and coded.

@st.cache(persist= True)
def load():
    data= pd.read_csv("mushrooms.csv")
    label= LabelEncoder()
    for i in data.columns:
        data[i] = label.fit_transform(data[i])
return data
df = load()

And since we don’t want our app to load the data every single time any change is made to the code. For this, we can use a streamlit decorator @st.cache(persist= True) which helps us to cache the outputs to disk and use it at any time.

We can add the additional functionality of displaying raw data on the web app by adding a checkbox at the sidebar which would reflect the data on the main section if selected.

if st.sidebar.checkbox("Display data", False):
    st.subheader("Show Mushroom dataset")
    st.write(df)

Creating Training and Test splits

The first step for model creation is to create training and test splits. We can create a split function for the same. Now we can use the train test split function from the sklearn library to split our dataset and drop the target column.

@st.cache(persist=True)
def split(df):
    y = df.type
    x = df.drop(columns=["type"])
    x_train, x_test, y_train, y_test =     train_test_split(x,y,test_size=0.3, random_state=0)
    
return x_train, x_test, y_train, y_test
x_train, x_test, y_train, y_test = split(df)

Now we can move forward to model training and model evaluation.

Evaluation Metrics

We would create a function for plotting various evaluation metrics. This lets the user select any evaluation metrics such as Confusion matrix, ROC Curve and Precision-Recall Curve as it is a binary classification problem.

Create a metrics list and a subheader for creating the plots. Let us also
create a class that has two values edible and poisonous.

def plot_metrics(metrics_list):
    if "Confusion Matrix" in metrics_list:
        st.subheader("Confusion Matrix")
        plot_confusion_matrix(model, x_test, y_test, display_labels=   class_names)
        st.pyplot()
    if "ROC Curve" in metrics_list:
        st.subheader("ROC Curve")
        plot_roc_curve(model, x_test, y_test)
        st.pyplot()
    if "Precision-Recall Curve" in metrics_list:
        st.subheader("Precision-Recall Curve")
        plot_precision_recall_curve(model, x_test, y_test)
        st.pyplot()
class_names = ["edible", "poisnous"]

As soon as the selected evaluation function is called, it will create a matplotlib plot that would be displayed on the main page.

Training an SVM classifier

Let’s add a subheader to the sidebar allowing the user to choose a classifier. Next, we can construct a widget that shows a list of classifiers in a drop-down fashion. We can add our classifier options starting with SVM, logistic regression, and the last one as the random first question.

st.sidebar.subheader("Choose classifier")
classifier = st.sidebar.selectbox("Classifier", ("Support Vector Machine (SVM)", "Logistic Regression", "Random Forest"))

webapps with streamlit| svm — Adding options for classifiers

Adding SVM hyperparameters

After the user selects the SVM classifier, a list of hyper-parameters is displayed. We can add a regularization parameter and provide a range of 0.1 to 10.0 which can be handled by plus and minus signs. The power of SVMs is really by choosing their kernels.

We can simply add a radio button that allows users to select from either
of two kernels — RBF and linear. And lastly, let’s add the last hyperparameter called Gamma.

As we also want the evaluation metrics to be plotted onto the web app. So let’s also add the feature that caters to the user’s required evaluation metrics for this, we can create a multi-select widget that allows the user to input one or more metrics.

if classifier == "Support Vector Machine (SVM)":
    st.sidebar.subheader("Hyperparameters")
    C = st.sidebar.number_input("C (Regularization parameter)", 0.01, 10.0, step=0.01, key="C")
    kernel = st.sidebar.radio("Kernel", ("rbf", "linear"), key="kernel") 
    gamma = st.sidebar.radio("Gamma (Kernal coefficient", ("scale", "auto"), key="gamma")
metrics = st.sidebar.multiselect("What metrics to plot?", ("Confusion Matrix", "ROC Curve", "Precision-Recall Curve"))

Training Model

Moving forward towards creating the actual classifier. Firstly we shall instantiate the model and fit our training data. Followed by calculation of accuracy and then performing predictions on the test dataset.

We are not only concerned with the accuracy but also precision and recall hence we would like to display them on the main screen.

Enclose the following code in the if the block of the SVM classifier.

if st.sidebar.button("Classify", key="classify"):
    st.subheader("Support Vector Machine (SVM) results")
    model = SVC(C=C, kernel=kernel, gamma=gamma)
    model.fit(x_train, y_train)
    accuracy = model.score(x_test, y_test)
    y_pred = model.predict(x_test)
    st.write("Accuracy: ", accuracy.round(2))
    st.write("Precision: ", precision_score(y_test, y_pred, labels=class_names).round(2))
    st.write("Recall: ", recall_score(y_test, y_pred, labels=class_names).round(2)) 
    plot_metrics(metrics)

Awesome!! we have completed the SVM classifier allowing the user to experiment with various evaluation metric and hyperparameter tuning features

svm confusion matrix — Confusion matrix — SVM

Change the regularization parameter and checkout the different evaluation metrics!

svm results | webapps with streamlit — ROC Curve-SVM

We can also switch the kernel and notice how the accuracy improves! We can analyze the same by the evaluation metrics.

Training Logistic Regression Classifier

Let’s build out the functionality for our logistic regression classifier just as we did for SVM.

Fit the logistic regression model and get the accuracy and prediction for the same and write the corresponding results on the main page. Let’s not forget to make certain minor changes such as adding a slider that will allow the user to choose the maximum iteration. Fix the lower and upper bound to 100 and 500 respectively.

Change the hyperparameters according to the logistic regression classifier and remove the Kernel and Gamma parameters.

if classifier == "Logistic Regression":
    st.sidebar.subheader("Hyperparameters")
    C = st.sidebar.number_input("C (Regularization parameter)", 0.01, 10.0, step=0.01, key="C_LR")
    max_iter = st.sidebar.slider("Maximum iterations", 100, 500, key="max_iter")
    metrics = st.sidebar.multiselect("What metrics to plot?", ("Confusion Matrix", "ROC Curve", "Precision-Recall Curve"))
    
    if st.sidebar.button("Classify", key="classify"):
        st.subheader("Logistic Regression Results")
        model = LogisticRegression(C=C, max_iter=max_iter)
        model.fit(x_train, y_train)
        accuracy = model.score(x_test, y_test)
        y_pred = model.predict(x_test)
        
        st.write("Accuracy: ", accuracy.round(2))
        st.write("Precision: ", precision_score(y_test, y_pred, labels=class_names).round(2))
        st.write("Recall: ", recall_score(y_test, y_pred, labels=class_names).round(2))
        plot_metrics(metrics)

Now we are all set for comparing the performance of both the classifiers!

confusion matrix | webapps with streamlit

Training Random Forest classifier

Welcome to the final task of our project! Here we would create a random forest classifier. Start by clearing out the previous parameters and add new parameters such as the number of estimators that defines the number of trees in the forest for this we add an input box, set the default number as 100, and define 10 as the increment value.

Add other parameters as well, the first one being the max depth of the forest and this time set increment step by just 1. Next, we can include a radio button that allows the user to select bootstrap samples when building trees.

if classifier == "Random Forest":
    st.sidebar.subheader("Hyperparameters")
    n_estimators= st.sidebar.number_input("The number of trees in the forest", 100, 5000, step=10, key="n_estimators")
    max_depth = st.sidebar.number_input("The maximum depth of tree", 1, 20, step =1, key="max_depth")
    bootstrap = st.sidebar.radio("Bootstrap samples when building trees", ("True", "False"), key="bootstrap")
    
    metrics = st.sidebar.multiselect("What metrics to plot?", ("Confusion Matrix", "ROC Curve", "Precision-Recall Curve"))
    
    if st.sidebar.button("Classify", key="classify"):
        st.subheader("Random Forest Results")
        model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, bootstrap= bootstrap, n_jobs=-1 )
        model.fit(x_train, y_train)
        accuracy = model.score(x_test, y_test)
        y_pred = model.predict(x_test)
        st.write("Accuracy: ", accuracy.round(2))
        st.write("Precision: ", precision_score(y_test, y_pred, labels=class_names).round(2))
        st.write("Recall: ", recall_score(y_test, y_pred, labels=class_names).round(2))
        plot_metrics(metrics)

Perform classification using Random forest classifier and notice how we have very high accuracy and desirable position and recall values.

End Notes

We have successfully used the Streamlit library and Python to build a simple yet powerful machine learning WebApp.We have also learned to the title, add markdown text, and widgets such as dropdowns, radio buttons, number inputs, checkboxes, and sliders.
Not only that, we have implemented these widgets flexibly to allow the user to perform model selection and set hyperparameters, allowing users to pick various evaluation metrics.

With this hands-on project, you must have realized the real power of the streamlit library!! Streamlit allows anyone with no formal machine learning or coding background can easily work around with ML models.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Shivangi

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Jorge A. Shorter

Thank's, It's a great thing that this article suggested the importance of Streamlit Web API for NLP.

Really like these new tips, which I haven't heard of before, like the Streamlit Web API for NLP. Can’t wait to implement some of these as soon as possible.

Shubham Pandey

This is amazing, I will try this for my project, Thanx

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

A brief introduction to building interactive ML WebApps With Streamlit

Objectives

Introduction

Why choose Streamlit?

Learning Objectives

Setting up the application

Installation

Importing

Initialization

Running the application

Data Loading

Creating Training and Test splits

Evaluation Metrics

Training an SVM classifier

Adding SVM hyperparameters

Training Model

Training Logistic Regression Classifier

Training Random Forest classifier

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck