Anyone involved in the data science development process knows how difficult it can be to get your model into production. It’s all well and good to have achieved a benchmark solution but if you can’t get your code into production, it essentially becomes meaningless. There are multiple challenges in machine learning development.
Databricks, founded by the creators of Apache Spark, have released a unified solution to all machine learning framework challenges – MLflow. It is an open source machine learning platform that manages the entire ML lifecycle (from start to production) and is designed to work with any ML library.
In a blog post announcing the release of MLflow, Databricks have listed down the reasons why they decided to develop this tool. They have seen multiple issues with how companies struggle to manage ML workflows. From data preparation to training the model, data scientists prefer using a myriad of tools to validate how good their system is. This requires productioning a lot of libraries, something that is beyond most organizations. Also, reproducing steps of a workflow is critical but can often by difficult to do without detailed tracking. And of course, getting the model into production is the hardest part. There are potentially multiple tools and environments for deploying and there is no standard way to move models from any library to any of these tools.
MLflow can work with any ML library, algorithm, deployment tool or language. Other advantages it offers are:
If you have existing code, MLflow can be used with that as well! Since it is open source, you can even share your framework and models across organizations (assuming you also want to open source your code, obviously).
The current version of MLflow has three components:
The team is working on adding more components like monitoring the progress of your model. You can install MLflow right now using pip:
pip install mlflow
The project is currently in alpha but the developers feel it’s already good enough to be integrated into an organisation’s current environment. You can check out and follow their repository on GitHub here.
The likes of Facebook, Google and Uber have their own internal framework for machine learning workflows, but even these platforms are limited in their own way. Most of them support only built-in algorithms and are tied to the infrastructure in place at each organization. Not the most flexible way to work.
Some of the alternatives to MLflow you can check out are Sagemaker, Sacred and FGLab. I feel MLflow has better options than these but you are free to make up your own mind!
I like the concept and am looking forward to them adding the aforementioned components like monitoring the progress of your models. This is another example of the ML community giving back to everyone by making such a breakthrough tool open source. If you try it out, do let us know in the comments below!