How to Fine-Tune Large Language Models with MonsterAPI

Avikumar talaviya Last Updated : 23 Jul, 2024

7 min read

Introduction

Imagine if your virtual assistant could understand and anticipate your needs perfectly. This vision is becoming a reality with advancements in large language models (LLMs). However, to tailor these models to specific tasks, fine-tuning is essential. Think of it as sculpting a rough block into a precise masterpiece. MonsterAPI simplifies this process, making fine-tuning and evaluation accessible and efficient. In this guide, we’ll show you how MonsterAPI helps refine and assess LLMs, turning them into powerful tools for your unique needs.

How to Fine-Tune Large Language Models with MonsterAPI

Learning Objectives

Understanding the complete process of fine-tuning and evaluation using MonsterAPI platform.
Exploring why evaluating fine-tuning models is necessary for the accuracy and coherency in generating answers.
Hands-on guide to fine-tuning and evaluation using Monster APIs which are developer friendly and easy to use.

Evolution of Large Language Models
What is LLM Fine-tuning and How Does it Work?
What is LLM Evaluation?
Step-by-Step Guide to LLM Fine-tuning and Evaluation Using Monster API
Frequently Asked Questions

Evolution of Large Language Models

Large language models have seen significant advancements in recent years as the field of natural language processing keeps growing. Many closed-source and open-source models are being published for researchers and developers to advance the AI field. These LLMs are performing exceptionally well on general tasks answering a wide range of queries but to make these models personalized and achieve greater accuracy on specific tasks we need to fine-tune these models.

Fine-tuning transforms pre-trained models into context-specific models by adapting domain-specific training with custom datasets. Fine-tuning requires a dedicated dataset to train LLMs and then deploy them on the server for certain use cases. Along with fine-tuning it is also crucial to evaluate these models to measure their effectiveness and on a variety of domain-related tasks that businesses might intend to do.

MonsterAPI helps developers and businesses in fine-tuning and evaluation using the ‘llm_eval’ engine. MonsterAPI has designed no-code as well as code-based fine-tuning APIs that simplify the entire process. The following are the benefits of Monster API:

Automating configuring GPU computing environments.
Optimises memory usage for finding the optimal batch size.
Set up model configurations manually for business-specific requirements.
Integrates model experiment tracking using WandB.
Integration of evaluation engine to test model performance against benchmarks.

What is LLM Fine-tuning and How Does it Work?

Fine-tuning is a technique to train the custom dataset on pre-trained LLM for a specific task. It modifies the parameters of pre-trained LLM to evolve into task-specific LLM by leveraging a vast amount of general knowledge of pre-trained LLM. Fine-tuning is done through the following process:

Fine-Tune Large Language Models with MonsterAPI

Pre-trained model selection: Firstly, businesses need to find suitable pre-trained models from various available models such as Llama, SDXL, Claude, Gemma etc. depending upon their needs.
Dataset preparation: Gather and collect the custom dataset specific to the task for which you’re training the LLM. Pre-process and structure the dataset in input-output format for parsing during the fine-tuning process.
Model training: Once the dataset is prepared the pre-trained model is trained for a specific task. During this phase, model weights are adjusted based on new data, enabling it to learn new patterns from the customer dataset. Monster API helps in fine-tuning models with the help of highly optimised and cost-friendly GPUs. we will learn more about the process in-depth in the upcoming sections.
Hyperparameter tuning: The fine-tuning process also needs optimization of hyperparameters such as batch size, learning rate, training epochs, GPU configurations, etc.
Evaluation of Fine-tuned models: Once the model is trained we need to evaluate the model performance using metrics such as MMLU, GSM8k, truthfullqa, etc for the performance in production. Monster API provides an integrated evaluation API so that developers can test their models once it is fine-tuned on the custom dataset. We will learn more about LLM evaluation in the next section.

What is LLM Evaluation?

LLM evaluation means the assessment of fine-tuned models involving the performance and effectiveness of a targeted task that we want to Achieve. The evaluation ensures models meet the desired accuracy, coherency and consistency on the validation dataset.

A wide range of evaluation metrics, such as MMLU and GSM8k, test the performance of language models on validation datasets. Comparing these evaluations against benchmarks reveals areas for further improvement in model performance.

MonsterAPI provides a comprehensive LLM evaluation engine to test and assess the fine-tuned model. Evaluation API can be used as follows:

import requests

url = "https://api.monsterapi.ai/v1/evaluation/llm"

payload = {
    "deployment_name": "Model_deployment_name",
    "basemodel_path": "mistralai/Mistral-7B-v0.1",
    "eval_engine": "lm_eval",
    "task": "gsm8k,hellaswag"
}
headers = {
    "accept": "application/json",
    "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

As seen in the above code snippet developed model name along with the model path, eval_engine, and evaluation metrics loaded into the POST request to fine-tune the model which results in a comprehensive report of model performance and evaluation. Now we will look at the step-by-step guide to fine-tune and evaluate models using MonsterAPI with code examples.

Step-by-Step Guide to LLM Fine-tuning and Evaluation Using Monster API

MonsterAPI LLM fine-tuner is 10X faster and more efficient with the lowest cost for fine-tuning models across its alternatives. It supports a wide range of models in text generation, code generation, speech-to-text and text-to-speech translation, and image generation for fine-tuning for specific tasks. In this guide, we will learn about the fine-tuning process for text generation models followed by the evaluation of models using Monster API llm eval engine.

MonsterAPI uses a network of computing resources from NVIDIA A100 GPUs with RAMs ranging from 8GB to 80GB depending upon the size of models and hyperparameters configured. Let’s compare the time taken and cost of fine-tuning models with various platforms to choose the right platform for your product.

Platform/service provider	Model Name	Time taken	Cost of fine-tuning
MonsterAPI	Falcon-7B	27min 26s	$5-6
MonsterAPI	Llama-7B	115 mins	$6
MosaicML	MPT-7B-Instruct	2.3 Hours	$37
Valohai	Mistral-7B	3 hours	$1.5
Mistral	Mistral-7B	2-3 hours	$4

Step1: Setup Environment and Install Relevant Libraries

Before we begin fine-tuning the large language model, we need to install the necessary libraries and set up the Monster API key for launching a fine-tuning job by initialising the MonsterAPI client. Sign up on MonsterAPI to get the FREE API key for your project (SignUp). In the below code snippet, we have set up a project environment for our fine-tuning process.

!pip install monsterapi==1.0.8

import os
from monsterapi import client as mclient
import json
import logging
import requests
import os
import huggingface_hub as hf_hub
from huggingface_hub import HfApi, hf_hub_download, file_exists

# Add monster API key over here
os.environ['MONSTER_API_KEY'] = 'YOUR_MONSTER_API_KEY'
client = mclient(api_key=os.environ.get("MONSTER_API_KEY"))

Step2: Prepare the Payload and Launch Fine-tuning Job

Once the project environment is set, we set up a launch payload that consists of the base model path, LoRA parameters, data source path, and training details such as epochs, learning rates etc. for our fine-tuning job. Once the fine-tuning launch payload is ready we call the Monster API client to run the process and get the fine-tuned model without hassle. In the below code snippet, we have set up a launch payload for our fine-tuning job.

# prepare a launchpad 
launch_payload = {
    "pretrainedmodel_config": {
        "model_path": "huggyllama/llama-7b",
        "use_lora": True,
        "lora_r": 8,
        "lora_alpha": 16,
        "lora_dropout": 0,
        "lora_bias": "none",
        "use_quantization": False,
        "use_gradient_checkpointing": False,
        "parallelization": "nmp"
    },
    "data_config": {
        "data_path": "tatsu-lab/alpaca",
        "data_subset": "default",
        "data_source_type": "hub_link",
        "prompt_template": "Here is an example on how to use 
        tatsu-lab/alpaca dataset 
        ### Input: {instruction} ### Output: {output}",
        "cutoff_len": 512,
        "prevalidated": False
    },
    "training_config": {
        "early_stopping_patience": 5,
        "num_train_epochs": 1,
        "gradient_accumulation_steps": 1,
        "warmup_steps": 50,
        "learning_rate": 0.001,
        "lr_scheduler_type": "reduce_lr_on_plateau",
        "group_by_length": False
    },
    "logging_config": { "use_wandb": False }
}

# finetune the service using configured params
ret = client.finetune(service="llm", params=launch_payload)
deployment_id = ret.get("deployment_id")
print(ret)

In the above code, we have the following key configurations for fine-tuning the pre-trained model on a custom dataset.

Pretrainedmodel_config: It takes pre-trained model paths like llama-7B and LoRA parameters like lora_r, lora_alpha and lora_dropout which will be the base model for our fine-tuning dataset. Llama-7B is trained using optimized transformer architecture and it is efficient in language and text generation tasks.
Data_config: It takes a data source path which can be custom data or data source from hugging face hub with prompt template based on input and output structure.
Training_config: It takes training configurations like epochs, learning rate, and early stopping rate for specifying training parameters.

Step3: Fetch Fine-tuning Job Status and Job Logs

After the fine-tuning process which can take up to 5-10 minutes, we can confirm the model deployment status and can get model fine-tuning job logs for training process review. Check out our official website for more information on LLM fine-tuning here.

# Get deployment status
status_ret = client.get_deployment_status(deployment_id)
print(status_ret)

# Get deployment logs
logs_ret = client.get_deployment_logs(deployment_id)
print(logs_ret)

Step4: Evaluate Fine-tuned Model and Get Scores Using the LLM Evaluation Engine

Once the context-specific model is trained we evaluate the fine-tuned model using our platform’s llm evaluation API to test the accuracy model. Monster API offers a comprehensive report of model insights based on given evaluation metrics such as MMLU, gsm8k, hellaswag, arc, and truthfulqa alike. In the below code, we assign a payload to the evaluation API that evaluates the deployed model and returns the metrics and report from the result URL.

import requests
base_model = launch_payload['pretrainedmodel_config']['model_path']
lora_model_path = status_ret['info']['model_url']


# evaluation api URL
url = "https://api.monsterapi.ai/v1/evaluation/llm"

payload = {
    "eval_engine": "lm_eval",
    "basemodel_path": base_model,
    "loramodel_path": lora_model_path,
    "task": "mmlu"
}
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": f"Bearer {os.environ['MONSTER_API_KEY']}"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)
# Extracting deployment ID from response
response_data = response.json()
serving_params = response_data.get("servingParams", {})
eval_deployment_id = serving_params.get("deployment_id")

# Get deployment logs
logs_ret = client.get_deployment_status(eval_deployment_id)
print(logs_ret)

result_url = logs_ret["info"]["result_url"]

response = requests.get(result_url)
result_json = response.json()

print(result_json)
# Extract required values from the JSON
Evaluation_Metrics = {
    "MMLU": result_json["results"]["mmlu"]["acc,none"]
}
print(Evaluation_Metrics)

The above code evaluates the fine-tuned model with the ‘lm_eval’ engine on the MMLU evaluation metric using monster APIs. To learn more about the evaluation of models check out the API page here.

Conclusion

Fine-tuning LLMs significantly enhances their performance for specific tasks, and evaluating these models is crucial to ensure their effectiveness and reliability. Our MonsterAPI platform offers robust tools for fine-tuning and evaluation, streamlining the process and offering precise performance metrics. By leveraging MonsterAPI’s LLM evaluation engine, developers can achieve high-quality, specialized language models with confidence, ensuring they meet the desired standards and perform optimally in real-world applications for their context and domain. Thus, the MonsterAPI platform provides state of the art solution for fine-tuning and evaluation with a comprehensive report to develop custom models with few lines of code.

Key Takeaways

We learnt comprehensive insights on the LLM fine-tuning process from model selection to fine-tuned model evaluation by using MonsterAPI’s easy to use platform.
Automated GPU configuration for optimised model training and performance measurement with code examples.
We learned the hands-on code walkthrough to fine-tune and evaluate the large language model that can be applied for custom datasets.

Frequently Asked Questions

Q1. What is the fine-tuning and evaluation of LLMs?

A. Fine-tuning is a process of adapting pre-trained weights of the models to a customer dataset of domain-specific tasks and queries. Evaluation is process of assessing the accuracy of models against industry benchmarks to ensure high quality model development.

Q2. How does MonsterAPI help in fine-tuning large language models?

A. MonsterAPI helps with hosted APIs for fine-tuning and evaluation of LLMs with low costs and optimized computing resources.

Q3. What types of datasets are supported for fine-tuning LLMs?

A. Datasets such a text, codebases, images, and videos are used in fine-tuning models based on selection of base model for fine-tuning process.

Avikumar talaviya

I specialize in data science and machine learning with hands-on experience in working on various end-to-end data science projects. I am the chapter co-lead of the Mumbai local chapter of Omdena. I am also a kaggle master and educator ambassador at streamlit with volunteers around the world.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

How to Fine-Tune Large Language Models with MonsterAPI

Introduction

Learning Objectives

Table of contents

Evolution of Large Language Models

What is LLM Fine-tuning and How Does it Work?

What is LLM Evaluation?

Step-by-Step Guide to LLM Fine-tuning and Evaluation Using Monster API

Step1: Setup Environment and Install Relevant Libraries

Step2: Prepare the Payload and Launch Fine-tuning Job

Step3: Fetch Fine-tuning Job Status and Job Logs

Step4: Evaluate Fine-tuned Model and Get Scores Using the LLM Evaluation Engine

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv