Have you ever thought there would be a system where we can predict the efficiency of electric vehicles and that users can easily use that system? In the world of Electric Vehicles, we can predict the efficiency of electric vehicles with extreme accuracy. This concept has now come into the real world, we are extremely thankful for Zenml and MLflow. In this project, we will explore the technical deep dive, and we will see how combining data science, machine learning, and MLOps creates this technology beautifully, and you will see how we use ZenML for electric vehicles.
In this article, we will learn,
This article was published as a part of the Data Science Blogathon.
For this project, we will start collecting the data from Kaggle. Kaggle is an online platform offering many datasets for data science and machine learning projects. You can collect data from anywhere as you wish. By collecting this dataset, we can perform our prediction into our model. Here is my GitHub repository where you can find all the files or templates – https://github.com/Dhrubaraj-Roy/Predicting-Electric-Vehicle-Efficiency.git
Efficient electric vehicles are the future, but predicting their range accurately is very difficult.
Our project combines data science and MLOps to create a precise model for forecasting electric vehicle efficiency, benefiting consumers and manufacturers.
Why do we want to set up a Virtual Environment?
It helps us to make our project stand out and not conflict with other projects in our system.
Creating a Virtual Environment
python -m venv myenv
#then for activation
myenv\Scripts\activate
python3 -m venv myenv
#then for activation
source myenv/bin/activate
It helps keep our environment clean.
With our environment ready, we need to install Zenml. Now, what is Zenml? So, Zenml is a machine learning operations (MLOps) framework for managing end-to-end machine learning pipelines. We chose Zenml because of the efficient management of machine learning pipelines. Therefore, you need to install the Zenml server.
Use this command in your terminal to install the Zenml server –
pip install ‘zenml[server]’
This is not the end; after installing the Zenml server, we need to create the Zenml repository, for creating Zenml repository –
zenml init
Why We Use `zenml init`: `zenml init` is used to initialize a ZenML repository, creating the structure necessary to manage machine learning pipelines and experiments effectively.
To satisfy project dependencies, we utilized a ‘requirements.txt’ file. In this file, you should have those dependencies.
catboost==1.0.4
joblib==1.1.0
lightgbm==3.3.2
optuna==2.10.0
streamlit==1.8.1
xgboost==1.5.2
markupsafe==1.1.1
zenml==0.35.1
When working on a data science project, we should organize everything properly. Let me break down how we keep things structured in our project:
We organize our project into folders. There are some folders we need to create.
This is where we assemble our pipeline, similar to setting up a production line for your project. Inside the ‘Pipelines’ folder, ‘Training_pipeline.py’ acts as the primary production machine. In this file, we imported ‘Ingest_data.py’ and the ‘ingest_df’ class to prepare the data, clean it up, train the model, and evaluate its performance. To run the entire project, utilize ‘run_pipeline.py’, similar to pushing the start stage on your production line with the command:
python run_pipeline.py
Here, you can see the file structure of the project-
This structure helps us to run our project smoothly, just like a well-structured workspace helps you create a project effectively.
3. Setting up Pipeline
After organizing the project and configuring the pipeline, the next step is to execute the pipeline. Now, you might have a question: what is a pipeline? A pipeline is a set of automated steps that streamline the deployment, monitoring, and management of machine learning models from development to production. This is achieved by running the ‘zenml up‘ command, which acts as the power switch for your production line. It ensures that all defined steps in your data science project are executed in the correct sequence, initiating the entire workflow, from data ingestion and cleaning to model training and evaluation.
In the ‘Model’ folder, you’ll find a file called ‘data_cleaning,’ this file is responsible for data cleaning. Within this file, you’ll discover – Column Cleanup: A section dedicated to identifying and removing unnecessary columns from the dataset, making it more ordered and easier to find what you need. DataDevideStretegy Class: This class helps us strategize how to divide our data effectively. It’s like planning how to arrange your materials for your project.
class DataDivideStrategy(DataStrategy):
"""
Data dividing strategy which divides the data into train and test data.
"""
def handle_data(self, data: pd.DataFrame) -> Union[pd.DataFrame, pd.Series]:
"""
Divides the data into train and test data.
"""
try:
# Assuming "Efficiency" is your target variable
# Separating the features (X) and the target (y) from the dataset
X = data.drop("Efficiency", axis=1)
y = data["Efficiency"]
# Splitting the data into training and testing sets with a 80-20 split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Returning the divided datasets
return X_train, X_test, y_train, y_test
except Exception as e:
# Logging an error message if any exception occurs
logging.error("Error in Divides the data into train and test data.".format(e))
raise e
Now, we move on to the ‘Steps’ folder. Inside, there’s a file called ‘clean_data.py.’ This file is dedicated to data cleaning. Here’s what happens here:
import logging
from typing importTupleimport pandas as pd
from model.data_cleaning import DataCleaning, DataDivideStrategy, DataPreProcessStrategy
from zenml import step
from typing_extensions import Annotated
@stepdefclean_df(data: pd.DataFrame) -> Tuple[
Annotated[pd.DataFrame, 'X_train'],
Annotated[pd.DataFrame, 'X_test'],
Annotated[pd.Series, 'y_train'],
Annotated[pd.Series, 'y_test'],
]:
"""
Data cleaning class which preprocesses the data and divides it into train and test data.
Args:
data: pd.DataFrame
"""
try:
preprocess_strategy = DataPreProcessStrategy()
data_cleaning = DataCleaning(data, preprocess_strategy)
preprocessed_data = data_cleaning.handle_data()
divide_strategy = DataDivideStrategy()
data_cleaning = DataCleaning(preprocessed_data, divide_strategy)
X_train, X_test, y_train, y_test = data_cleaning.handle_data()
logging.info(f"Data Cleaning Complete")
return X_train, X_test, y_train, y_test
except Exception as e:
logging.error(e)
raise e
Now, let’s talk about creating the model_dev in the model folder. In this file, we mostly work on building the machine learning model.
This structured approach ensures that we have a clean and organized data-cleaning process, and our model development follows a clear blueprint, keeping the focus on MLOps efficiency rather than building an intricate model. In the future, we will update our model.
import logging
from abc import ABC, abstractmethod
import pandas as pd
from sklearn.linear_model import LinearRegression
from typing importDictimport optuna # Import the optuna library
# Rest of your code...
classModel(ABC):
"""
Abstract base class for all models.
""" @abstractmethoddeftrain(self, X_train, y_train):
"""
Trains the model on the given data.
Args:
x_train: Training data
y_train: Target data
"""passclassLinearRegressionModel(Model):
"""
LinearRegressionModel that implements the Model interface.
"""deftrain(self, X_train, y_train, **kwargs):
try:
reg = LinearRegression(**kwargs) # Create a Linear Regression model
reg.fit(X_train, y_train) # Fit the model to the training data
logging.info('Training complete')
# Log a message indicating training is completereturn reg
# Return the trained modelexcept Exception as e:
logging.error("error in training model: {}".format(e))
# Log an error message if an exception occursraise e
# Raise the exception for further handling
In the ‘model_train.py’ file, we make several important additions to our project:
Importing Linear Regression Model: We import ‘LinearRegressionModel’ from ‘model.mode_dev.‘ It has helped us to build our project. Our ‘model_train.py’ file is set up to work with this specific type of machine-learning model.
def train_model(
X_train: pd.DataFrame,
X_test: pd.DataFrame,
y_train: pd.Series,
y_test: pd.Series,
config: ModelNameConfig,
) -> RegressorMixin:
"""
Train a regression model based on the specified configuration.
Args:
X_train (pd.DataFrame): Training data features.
X_test (pd.DataFrame): Testing data features.
y_train (pd.Series): Training data target.
y_test (pd.Series): Testing data target.
config (ModelNameConfig): Model configuration.
Returns:
RegressorMixin: Trained regression model.
"""
try:
model = None
# Check the specified model in the configuration
if config.model_name == "linear_regression":
# Enable MLflow auto-logging
autolog()
# Create an instance of the LinearRegressionModel
model = LinearRegressionModel()
# Train the model on the training data
trained_model = model.train(X_train, y_train)
# Return the trained model
return trained_model
else:
# Raise an error if the model name is not supported
raise ValueError("Model name not supported")
except Exception as e:
# Log and raise any exceptions that occur during model training
logging.error(f"Error in train model: {e}")
raise e
This code trains a regression model (e.g., linear regression) based on a chosen configuration. It checks if the selected model is supported, uses MLflow for logging, trains the model on provided data, and returns the trained model. If the chosen model is not supported, it will raise an error.
Method ‘Train Model‘: The ‘model_train.py‘ file defines a method called ‘train_model‘, which returns a ‘LinearRegressionModel.’
Importing RegressorMixin: We import ‘RegressorMixin‘ from sklearn.base. RegressorMixin is a class in scikit-learn that provides a common interface for regression estimators. sklearn.base is a part of the Scikit-Learn library, a tool for building and working with machine learning models.
Create ‘config.py’ in the ‘Steps’ folder: In the ‘steps’ folder, we create a file named ‘config.py.’ This file contains a class called ‘ModelNameConfig.’ `ModelNameConfig` is a class in the ‘config.py’ file that serves as a configuration guide for your machine learning model. It specifies various settings and options for your model.
# Import the necessary class from ZenML for configuring model parameters
from zenml.steps import BaseParameters
# Define a class named ModelNameConfig that inherits from BaseParameters
class ModelNameConfig(BaseParameters):
"""
Model Configurations:
"""
# Define attributes for model configuration with default values
model_name: str = "linear_regression" # Name of the machine learning model
fine_tuning: bool = False # Flag for enabling fine-tuning
Method ‘Evaluate Model‘: In ‘evaluation.py’ within the ‘steps’ folder, we create a method called ‘evaluate_model’ that returns performance metrics like R-squared (R2) score and Root Mean Squared Error (RMSE).
@step(experiment_tracker=experiment_tracker.name)
def evaluate_model(
model: RegressorMixin, X_test: pd.DataFrame, y_test: pd.Series
) -> Tuple[Annotated[float, "r2"],
Annotated[float, "rmse"],
]:
"""
Evaluate a machine learning model's performance using various metrics and log the results.
Args:
model: RegressorMixin - The machine learning model to evaluate.
X_test: pd.DataFrame - The test dataset's feature values.
y_test: pd.Series - The actual target values for the test dataset.
Returns:
Tuple[float, float] - A tuple containing the R2 score and RMSE.
"""
try:
# Make predictions using the model
prediction = model.predict(X_test)
# Calculate Mean Squared Error (MSE) using the MSE class
mse_class = MSE()
mse = mse_class.calculate_score(y_test, prediction)
mlflow.log_metric("mse", mse)
# Calculate R2 score using the R2Score class
r2_class = R2()
r2 = r2_class.calculate_score(y_test, prediction)
mlflow.log_metric("r2", r2)
# Calculate Root Mean Squared Error (RMSE) using the RMSE class
rmse_class = RMSE()
rmse = rmse_class.calculate_score(y_test, prediction)
mlflow.log_metric("rmse", rmse)
return r2, rmse # Return R2 score and RMSE
except Exception as e:
logging.error("error in evaluation".format(e))
raise e
These additions in ‘model_train.py,’ ‘config.py,’ and ‘evaluation.py’ enhance our project by introducing machine learning model training, configuration, and thorough evaluation, ensuring that our project meets high-quality standards.
Next, we update the ‘training_pipeline’ file to run the pipeline successfully; ZenML is an open-source MLOps framework designed to streamline and standardize machine learning workflow management. To see your pipeline, you can use this command ‘zenml up.’
Now, we proceed to implement the experiment tracker and deploy the model:
If you’re running the ‘run_deployment.py’ script, you must install some integrations using ZenML. Now, integrations help connect your model to the deployment environment, where you can deploy your model.
Zenml provides integration with MLOps tools. By running the following command, we have to install Zenml’s integration with MLflow, it’s a very important step:
To create this integration, you have to use this command:
zenml integration install mlflow -y
This integration helps us manage those experiments efficiently.
Experiment tracking is a critical aspect of MLOps. We use Zenml and MLflow to monitor, record, and manage all aspects of our machine-learning experiments, facilitating efficient experimentation and reproducibility.
Register Experiment Tracker:
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
Register Model Deployer:
zenml model-deployer register mlflow --flavor=mlflow
Stack:
zenml stack register mlflow_stack -a default -o default -d mlflow -e mlflow_tracker --set
Deployment is the final step in our pipeline, and it’s an essential part of our project. Our goal is not just to build the model, we want our model to be deployed on the internet so that users can use it.
Deployment Pipeline Configuration: You have a deployment pipeline defined in a Python file named ‘deployment_pipeline.py.’ This pipeline manages the deployment tasks.
Deployment Trigger: There’s a step named ‘deployment_trigger’
class DeploymentTriggerConfig(BaseParameters):
min_accuracy = 0
@step(enable_cache=False)
def dynamic_importer() -> str:
"""Downloads the latest data from a mock API."""
data = get_data_for_test()
return data
This code defines a class `DeploymentTriggerConfig` with a minimum accuracy parameter. In this case, it’s zero. It also defines a pipeline step, dynamic_importer, that downloads data from a mock API, with caching disabled for this step.
The ‘prediction_service_loader’ step retrieves the prediction service started by the deployment pipeline. It is used to manage and interact with the deployed model.
def prediction_service_loader(
pipeline_name: str,
pipeline_step_name: str,
running: bool = True,
model_name: str = "model",
) -> MLFlowDeploymentService:
"""Get the prediction service started by the deployment pipeline.
Args:
pipeline_name: name of the pipeline that deployed the MLflow prediction
server
step_name: the name of the step that deployed the MLflow prediction
server
running: when this flag is set, the step only returns a running service
model_name: the name of the model that is deployed
"""
# get the MLflow model deployer stack component
mlflow_model_deployer_component = MLFlowModelDeployer.get_active_model_deployer()
# fetch existing services with same pipeline name, step name and model name
existing_services = mlflow_model_deployer_component.find_model_server(
pipeline_name=pipeline_name,
pipeline_step_name = pipeline_step_name,
model_name=model_name,
running=running,
)
if not existing_services:
raise RuntimeError(
f"No MLflow prediction service deployed by the "
f"{pipeline_step_name} step in the {pipeline_name} "
f"pipeline for the '{model_name}' model is currently "
f"running."
)
return existing_services[0]
This code defines a function `prediction_service_loader` that retrieves a prediction service started by a deployment pipeline.
The ‘predictor’ step runs inference requests against the prediction service. It processes incoming data and returns predictions.
@step
def predictor(
service: MLFlowDeploymentService,
data: str,
) -> np.ndarray:
"""Run an inference request against a prediction service"""
service.start(timeout=10) # should be a NOP if already started
data = json.loads(data) # Parse the input data from a JSON string into a Python dictionary.
data.pop("columns")
data.pop("index")
columns_for_df = [ #Define a list of column names for creating a DataFrame.
"Acceleration",
"TopSpeed",
"Range",
"FastChargeSpeed",
"PriceinUK",
"PriceinGermany",
]
df = pd.DataFrame(data["data"], columns=columns_for_df)
json_list = json.loads(json.dumps(list(df.T.to_dict().values())))
data = np.array(json_list) # Convert the JSON list into a NumPy array.
prediction = service.predict(data)
return prediction
Deployment Execution: You have a script, ‘run_deployment.py,’ that allows you to trigger the deployment process. This script takes the ‘–config’ parameter. The `–config` parameter is used to specify a configuration file or settings for a program via the command line, which can be set to ‘deploy’ for deploying the model, ‘predict’ for running predictions, or ‘deploy_and_predict’ for both.
Deployment Status and Interaction: The script also provides information about the status of the MLflow prediction server, including how to start and stop it. It uses MLFlow for model deployment.
Min Accuracy Threshold: The ‘min_accuracy’ parameter can be specified to set a minimum accuracy threshold for model deployment. If satisfied with that value, the model will deployed.
Docker Configuration: Docker is used for managing the deployment environment, and you have defined Docker settings in your deployment pipeline.
This deployment process appears to be focused on deploying machine learning models and running predictions in a controlled and configurable manner.
python3 run_deployment.py --config deploy
Once our model is deployed, our model is ready for predictions.
python3 run_deployment.py --config predict
The Streamlit app provides a user-friendly interface for interacting with our model’s predictions. Streamlit simplifies the creation of interactive, web-based data science applications, making it easy for users to explore and understand the model’s predictions. Again, you can find the code on GitHub for the Streamlit app.
With this, you can explore and interact with our model’s predictions.
In this article, we’ve delved into an exciting project that demonstrates the power of MLOps in predicting electric vehicle efficiency. We’ve learned about Zenml and MLFlow, which are crucial in creating an end-to-end machine-learning pipeline. We’ve also explored the data collection process, problem statement, and the solution to accurately predict electric vehicle efficiency.
This project highlights the significance of efficient electric vehicles and how MLOps can be harnessed to create precise models for forecasting efficiency. We’ve covered essential steps, including setting up a virtual environment, model development, configuring model settings, and evaluating model performance. The article concludes by emphasizing the importance of experiment tracking, deployment, and user interaction through a Streamlit app. With this project, we’re one step closer to shaping the future of electric vehicles.
A. MLflow manages the end-to-end machine learning lifecycle, enabling experiment tracking, model packaging, and deployment, making it easier to develop and deploy machine learning models.
A. MLOps and DevOps serve distinct but complementary purposes: MLOps is tailored for the machine learning lifecycle, while DevOps focuses on software development. Neither is better; their integration can optimize end-to-end development and deployment.
A. Yes, MLOps often involves coding for developing machine learning models and automating deployment and management processes.
A. MLflow simplifies machine learning development by providing tools for experiment tracking, model versioning, and model deployment.
A. Yes, ZenML is a fully open-source MLOps framework that makes the transition from local development to production pipelines as easy as 1 line of code.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.