This article was published as a part of the Data Science Blogathon
Like DevOps, MLOps (machine learning operations) is a set of practices that aims to make developing and maintaining production machine learning seamless and efficient. MLOps seeks to increase automation and improve production models’ quality while also focusing on business and regulatory requirements. A common architecture of an MLOps system would include data science platforms where models are constructed and the analytical engines where computations are performed. The MLOps tool orchestrates the movement of machine learning models, data, and outcomes between the systems. Several goals enterprises want to achieve through MLOps systems are Rapid deployment, pipeline automation, feature and log management, Reproducibility of models and predictions, etc.
MLRun is an open-sourced MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment.
Key benefits provided by the MLRun framework includes –
MLRun is composed of different layers, these convenient abstraction layers provide a lot of features to a wide variety of technology, like automating the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, and more. In every ML experiment, we preferably want to save our code, config, results, logs, input, outputs, etc, so that we can reproduce them in different development environments, MLRun helps to manage, save, reproduce our experiment without any hassle.
MLRun is composed of the following layers:
Read about MLRun framework here – Github Repository
The architecture consists of different basic components, combining these components create a pipeline.
Let’s discuss the main component of MLRun with examples.
To install MLRun on your device, run the following command in your terminal:
pip install mlrun
Let’s discuss some of the main components of MLRun with examples.
Project is a container consist of all your source code, metadata, artifacts, logs, models, etc. It helps in organizing all of your activities regarding the ML experiment.
You can define the project name, and then use mlrun.set_environment to set your project name.
from os import path import mlrun project_name_base = 'Project_name' # Mention Your Project Name Here project_name, artifact_path = mlrun.set_environment(project=project_name_base, user_project=True) print(f'Project name: {project_name}')
Output- Project name: Project_name
Functions are the small packages that we can write for the execution of the different individual steps of our pipeline. These steps include not limited to fetching data, transforming data, training multiple models, testing, etc. Below is a simple example of a function that fetches data from MongoDB atlas.
Funtion can be created in four different methods,
We define a simple python function, we can store this function in a source file and use mlrun.code_to_function to create a function object.
def fetch_data(context : MLClientCtx, data_path: DataItem): context.logger.info('Reading data from {}'.format(data_path)) m_client = pymongo.MongoClient("Mention The Link of Your MongoDB Client Here") db = m_client.test m_db = m_client["DB_name"] db_cm = m_db["DB_name"] df = pd.DataFrame.from_records(db_cm.find()) suicide_dataset = df target_path = path.join(context.artifact_path, 'data') context.logger.info('Saving datasets to {} ...'.format(target_path)) # Store the data sets in your artifacts database context.log_dataset('suicide_dataset', df=suicide_dataset, format='csv', index=False, artifact_path=target_path)
When a function is executed all information is about is stored in an object that is known as the Run object. This run object is created when you run any function it stores all information like function attributes (such as arguments, input, and outputs), results, and logs of the executed function.
We first define the function object, this function object can be used to execute all functions defined in the source code,
func_obj = mlrun.code_to_function(name='f_obj', kind='job', filename = 'Path of the Source code)
fetch_data_run_obj = func_obj.run(handler='fetch_data',inputs={'data_path': 'Mention Path of the DATA CSV'}, local=True)
We use this object to run our function, in handler we pass the function name, in input, we pass the argument of the function.
fetch_data_run_obj.outputs
This will give the output of the function, in this case, the fetched dataset.
4. Artifact- design data artifacts (such as data sets, graphs, pickle files, and models) that are produced or used by functions, runs, and workflows. We pass an artifact directory name, this is the directory you want to store your data. The directory structure is given below-
─── Artifact directory ├── Data ├── data (All your datasets) ├── model (saved model and model config) ├── artifacts/project_name-username (Contain all your artifact data) ├── functions/project_name-username (Contain all your function data) ├── runs/project_name-username (Contain all your run object data)
One of the most difficult parts of the Machine learning development phase is the production deployment and their management. MLOps helps to define a set of practices that aim to make developing and maintaining production machine learning seamless and efficient.
There are different frameworks defined for machine learning operations, in this article we learn about one such framework MLRun.
MLRun is an open-source MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment. MLRun has a lot of functionality available you can read about them in detail on their GitHub repository
Follow the official example and tutorials here
I hope you have learned something from this blog, do share it with others. Check out my personal Machine learning blog(https://code-ml.com/) for new and exciting content on different domains of ML and AI.
Mohammad Ahmad - Research Engineer LinkedIn - https://www.linkedin.com/in/mohammad-ahmad-ai/ Personal Blog - https://code-ml.com/ GitHub - https://github.com/ahmadkhan242 Twitter - https://twitter.com/ahmadkhan_242
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.