How to Save and Load Machine Learning Models in Python Using Joblib Library?

Pallav Sharma Last Updated : 04 Apr, 2025

6 min read

Machine Learning models require large datasets to get high accuracy, so in order to train a machine learning model with a large-size dataset, we also need a reasonable amount of time. So we use the joblib library to get rid of training the model again and again, instead, what we do is just train the model once and then save it using the joblib library, and then we use the same model.

This post will look at using Python’s joblib package to save and load machine learning models. For this project, Google Colab is used.

Joblib is a Python library for running computationally intensive tasks in parallel. It provides a set of functions for performing operations in parallel on large data sets and for caching the results of computationally expensive functions. Joblib is especially useful for machine learning models because it allows you to save the state of your computation and resume your work later or on a different machine.

Learning Objectives

Understanding the importance of the Joblib library and why saving our machine learning models is useful.
How to use the joblib library for saving and loading our trained machine learning model?
Understanding the different functions that save and load models, including functions like “save” and “load.”

This article was published as a part of the Data Science Blogathon.

Why Should you Use Joblib?

Compared to other techniques of storing and loading machine learning models, using Joblib has a number of benefits. Since data is stored as byte strings rather than objects, it may be stored quickly and easily in a smaller amount of space than traditional pickling. Moreover, it automatically corrects errors when reading or writing files, making it more dependable than manual pickling. Last but not least, using joblib enables you to save numerous iterations of the same model, making it simpler to contrast them and identify the most accurate one.

Joblib enables multiprocessing across several machines or cores on a single machine, which enables programmers to parallelize jobs across numerous machines. This makes it simple for programmers to utilize distributed computing resources like clusters or GPUs to accelerate their model training process.

Import Joblib

Import joblib using the following code:

# importing the joblib libraray
import joblib

If the above code gives an error, you don’t have joblib installed in your environment.

install joblib using the following code:

!pip install joblib

Make a Machine Learning Model

We will make a logistic regression model for this purpose and use the iris dataset present in sklearn. datasets.

The Iris dataset is a well-known dataset in the field of machine learning and statistics. It contains 150 observations of iris flowers and the measurements of their sepals and petals. The dataset includes 50 observations for each of three species of iris flowers (Iris setosa, Iris virginica, and Iris versicolor). The measurements included in the dataset are sepal length, sepal width, petal length, and petal width. The Iris dataset is commonly used as a benchmark for classification algorithms as it is small, well-understood, and multi-class.

Logistic Regression is a type of statistical method used for binary classification problems. It is used to model the relationship between a dependent variable and one or more independent variables. Logistic regression aims to estimate the probability of an event occurring based on the values of the independent variables. The output of logistic regression is a probability between 0 and 1, which can then be thresholded to make a binary decision about the class of the event. Logistic regression is widely used in various fields, including medicine, marketing, and finance, due to its simplicity, interpretability, and ability to handle various data types and distributions. Despite its simplicity, logistic regression is a powerful tool for solving many binary classification problems and is often a good starting point for more complex machine learning models.

import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split

# load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# fit a linear regression model
reg = linear_model.LogisticRegression()
reg.fit(X_train, y_train)#import csv

Saving the Model Using Joblib

Saving our trained machine learning model using the dump function of the joblib library.

# save the model to a file
joblib.dump(reg, 'regression_model.joblib')

# the First parameter is the name of the model and the second parameter is the name of the file
# with which we want to save it

# now the model named 'reg' will be saved as 'regression_model.joblib' in the current
# directory.

Below given image show the current working directory before saving the model using the joblib library.

Below is the screenshot after saving the model using the joblib dump method.

You can clearly notice that after running the code: joblib.dump(reg, ‘regression_model.joblib’), a new file has been saved in the current directory as ‘regression_model.joblib’.

Loading the Saved Model Using Joblib

Loading the regression_model.joblib for using it for making predictions.

# load the saved model
reg = joblib.load('regression_model.joblib')

Make Predictions Using the Loaded Model

Making predictions for the test dataset using our trained ML model.

# use the loaded model to make predictions
predictions = reg.predict(X_test)
predictions
#import csv

Output:

Make Predictions using the Loaded Model using Joblib

Joblib library is very useful when we want to use machine learning models in applications and websites.

Joblib can be useful in development in several ways:

1. Debugging and Profiling: It might be challenging to identify which sections of code are taking the longest to execute when creating a large application with several functions. Joblib offers simple tools for profiling the performance of your code, allowing you to locate and speed up the areas of your application that are the slowest.

2. Reproducibility: Doing the same calculations several times might be time-consuming when working with huge datasets. In order to reuse the results of time-consuming computations without having to run the code again, Joblib offers a means to cache the results. By doing this, you can save time and guarantee the reproducibility of your results.

3. Testing: Writing tests is crucial when creating a complex program since they ensure that the code performs as intended. Joblib offers a means to run tests concurrently so you can learn more quickly about the state of your code. This can speed up your development process and enable you to write and execute more tests in less time.

4. Experimentation: Running several iterations of the code simultaneously can be useful when creating a new algorithm or testing out various strategies. Joblib offers a straightforward method for running various iterations of your code concurrently so you can rapidly compare their outcomes and determine which strategy works best.

Conclusion

In conclusion, Joblib can be helpful in development by offering instruments for debugging and profiling, ensuring repeatability, accelerating the testing procedure, and enabling experimentation. With the aid of these features, you can create more substantial and intricate apps with greater productivity and efficiency.

The key takeaways of this article are as follows:

It offers Python-based machine learning frameworks like scikit-learn and TensorFlow developers an effective approach to instantly save and load their learned models without having to redo the time-consuming and expensive process of training from scratch each time they require them.
It also allows developers to take advantage of parallelization techniques such as multiprocessing across multiple machines or cores on a single machine making higher performance levels achievable at lower costs.
So if you’re looking for an easy way to optimize your model creation and storage processes in Python, look no further than JobLib!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Pallav Sharma

Hey guys, my name is pallav sharma. I am currently pursuing my B.tech degree. I enjoy solving complex problems and finding insights into data. I enjoy doing machine learning and data science. My fascination with these fields has continuously driven me to learn and improve my skills in this area.

Free Courses

4.5

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Learn how to deploy FastAPI and deploy ML model using FastAPI for real apps

Build Data Pipelines with Apache Airflow

Learn ETL pipeline building and workflow orchestration with Airflow.

4.6

Evaluation Metrics for Machine Learning Models

This course covers evaluation metrics to improve ML model performance.

4.8

The A to Z of Unsupervised Machine Learning

Learn Unsupervised ML & DBSCAN with real-world applications.

4.8

K-Nearest Neighbors (KNN) Algorithm in Python and R

Master KNN algorithm with hands-on Python & R tutorials.

Reading list

How to Save and Load Machine Learning Models in Python Using Joblib Library?

Table of Contents

Why Should you Use Joblib?

Import Joblib

Make a Machine Learning Model

Saving the Model Using Joblib

Loading the Saved Model Using Joblib

Make Predictions Using the Loaded Model

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Build Data Pipelines with Apache Airflow

Evaluation Metrics for Machine Learning Models

The A to Z of Unsupervised Machine Learning

K-Nearest Neighbors (KNN) Algorithm in Python and R

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

How to Save and Load Machine Learning Models in Python Using Joblib Library?

Table of Contents

Why Should you Use Joblib?

Import Joblib

Make a Machine Learning Model

Saving the Model Using Joblib

Loading the Saved Model Using Joblib

Make Predictions Using the Loaded Model

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Build Data Pipelines with Apache Airflow

Evaluation Metrics for Machine Learning Models

The A to Z of Unsupervised Machine Learning

K-Nearest Neighbors (KNN) Algorithm in Python and R

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques