MLRun: Introduction to MLOps framework

Mohammad Last Updated : 13 Jul, 2021

4 min read

This article was published as a part of the Data Science Blogathon

Overview

In this article, we will learn about MLOps.
About MLRun library and its features.
The architecture of MLRun framework with examples of each component

Introduction

Like DevOps, MLOps (machine learning operations) is a set of practices that aims to make developing and maintaining production machine learning seamless and efficient. MLOps seeks to increase automation and improve production models’ quality while also focusing on business and regulatory requirements. A common architecture of an MLOps system would include data science platforms where models are constructed and the analytical engines where computations are performed. The MLOps tool orchestrates the movement of machine learning models, data, and outcomes between the systems. Several goals enterprises want to achieve through MLOps systems are Rapid deployment, pipeline automation, feature and log management, Reproducibility of models and predictions, etc.

MLRun

MLRun is an open-sourced MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment.

Key benefits provided by the MLRun framework includes –

Rapide development of code from early stage to production.
Elastic scaling of batch and real-time workloads.
Feature management – preparation and monitoring of logs.
Works anywhere — IDE, multi-cloud, etc.

MLRun is composed of different layers, these convenient abstraction layers provide a lot of features to a wide variety of technology, like automating the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, and more. In every ML experiment, we preferably want to save our code, config, results, logs, input, outputs, etc, so that we can reproduce them in different development environments, MLRun helps to manage, save, reproduce our experiment without any hassle.

MLRun is composed of the following layers:

Feature and Artifact Store — handle the ingestion, processing, metadata, and storage of data and features across multiple repositories and technologies.
Elastic Serverless Runtimes — converts simple code to scalable and managed microservices with workload-specific runtime engines (such as Kubernetes jobs, Nuclio, Dask, Spark, and Horovod).
ML Pipeline Automation — automates data preparation, model training and testing, deployment of real-time production pipelines, and end-to-end monitoring.
Central Management — provides a unified portal for managing the entire MLOps workflow. The portal includes a UI, a CLI, and an SDK, which are accessible from anywhere.

Read about MLRun framework here – Github Repository

The architecture of the MLRun framework

The architecture consists of different basic components, combining these components create a pipeline.

Let’s discuss the main component of MLRun with examples.

To install MLRun on your device, run the following command in your terminal:

pip install mlrun

Let’s discuss some of the main components of MLRun with examples.

1. Project –

Project is a container consist of all your source code, metadata, artifacts, logs, models, etc. It helps in organizing all of your activities regarding the ML experiment.

You can define the project name, and then use mlrun.set_environment to set your project name.

from os import path
import mlrun

project_name_base = 'Project_name' # Mention Your Project Name Here

project_name, artifact_path = mlrun.set_environment(project=project_name_base, user_project=True)

print(f'Project name: {project_name}')

Output-
Project name: Project_name

2. Function –

Functions are the small packages that we can write for the execution of the different individual steps of our pipeline. These steps include not limited to fetching data, transforming data, training multiple models, testing, etc. Below is a simple example of a function that fetches data from MongoDB atlas.

Funtion can be created in four different methods,

mlrun.new_function
mlrun.code_to_function
mlrun.import_function
mlrun.function_to_module

We define a simple python function, we can store this function in a source file and use mlrun.code_to_function to create a function object.

def fetch_data(context : MLClientCtx, data_path: DataItem):
    context.logger.info('Reading data from {}'.format(data_path))
    m_client = pymongo.MongoClient("Mention The Link of Your MongoDB Client Here")
    db = m_client.test
    m_db = m_client["DB_name"]
    db_cm = m_db["DB_name"]
    df = pd.DataFrame.from_records(db_cm.find())
    suicide_dataset = df
    target_path = path.join(context.artifact_path, 'data')
    context.logger.info('Saving datasets to {} ...'.format(target_path))
    # Store the data sets in your artifacts database
    context.log_dataset('suicide_dataset', df=suicide_dataset, format='csv',
                        index=False, artifact_path=target_path)

3. Run –

When a function is executed all information is about is stored in an object that is known as the Run object. This run object is created when you run any function it stores all information like function attributes (such as arguments, input, and outputs), results, and logs of the executed function.

We first define the function object, this function object can be used to execute all functions defined in the source code,

func_obj = mlrun.code_to_function(name='f_obj', kind='job', filename = 'Path of the Source code)

fetch_data_run_obj = func_obj.run(handler='fetch_data',inputs={'data_path': 'Mention Path of the DATA CSV'},                                local=True)

We use this object to run our function, in handler we pass the function name, in input, we pass the argument of the function.

fetch_data_run_obj.outputs

This will give the output of the function, in this case, the fetched dataset.

4. Artifact- design data artifacts (such as data sets, graphs, pickle files, and models) that are produced or used by functions, runs, and workflows. We pass an artifact directory name, this is the directory you want to store your data. The directory structure is given below-

─── Artifact directory
    ├── Data
        ├── data (All your datasets)
        ├── model (saved model and model config)
    ├── artifacts/project_name-username (Contain all your artifact data)
    ├── functions/project_name-username (Contain all your function data)
    ├── runs/project_name-username (Contain all your run object data)

Conclusion

One of the most difficult parts of the Machine learning development phase is the production deployment and their management. MLOps helps to define a set of practices that aim to make developing and maintaining production machine learning seamless and efficient.

There are different frameworks defined for machine learning operations, in this article we learn about one such framework MLRun.

MLRun is an open-source MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment. MLRun has a lot of functionality available you can read about them in detail on their GitHub repository

Follow the official example and tutorials here

I hope you have learned something from this blog, do share it with others. Check out my personal Machine learning blog(https://code-ml.com/) for new and exciting content on different domains of ML and AI.

About the Author

Mohammad Ahmad - Research Engineer
LinkedIn - https://www.linkedin.com/in/mohammad-ahmad-ai/
Personal Blog - https://code-ml.com/
GitHub - https://github.com/ahmadkhan242
Twitter - https://twitter.com/ahmadkhan_242

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Mohammad

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

MLRun: Introduction to MLOps framework

Overview

Introduction

MLRun

The architecture of the MLRun framework

1. Project –

2. Function –

3. Run –

Conclusion

About the Author

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#