With the advent of Deep Learning and Machine Learning libraries, model prototyping has become very convenient. Consequently, the time taken to build a fairly accurate model has reduced drastically. But the business impact of a model can be observed only after the model is deployed, and thus exposed to real-world data.
Model deployment is a key component in the Machine Learning pipeline. Deploying a model poses several challenges, such as model versioning, containerization of the model, etc. Web frameworks like Flask and Django can be used to wrap the model into a REST API and expose the API. But this solution requires developers to write and maintain code to handle requests to the model and support other deployment-related features as well.
To tackle this problem, Tensorflow introduced Tensorflow Serving which is a flexible, high-performance serving system for machine learning models, designed for production environments. The goal of this talk is to give a brief introduction of TensorFlow Serving and illustrate its features using an example use case.
Key Takeaways for the Audience
- How to deploy models using TensorFlow Serving
- Interacting with the served model using gRPC or REST
- How to deploy different versions of a model using TensorFlow Serving
- Advanced configuration in TensorFlow Serving
Check out the video below to know more about the session.