How to Fine-Tune Large Language Models with MonsterAPI

Avikumar talaviya Last Updated : 23 Jul, 2024
7 min read

Introduction

Imagine if your virtual assistant could understand and anticipate your needs perfectly. This vision is becoming a reality with advancements in large language models (LLMs). However, to tailor these models to specific tasks, fine-tuning is essential. Think of it as sculpting a rough block into a precise masterpiece. MonsterAPI simplifies this process, making fine-tuning and evaluation accessible and efficient. In this guide, we’ll show you how MonsterAPI helps refine and assess LLMs, turning them into powerful tools for your unique needs.

How to Fine-Tune Large Language Models with MonsterAPI

Learning Objectives

  • Understanding the complete process of fine-tuning and evaluation using MonsterAPI platform.
  • Exploring why evaluating fine-tuning models is necessary for the accuracy and coherency in generating answers.
  • Hands-on guide to fine-tuning and evaluation using Monster APIs which are developer friendly and easy to use. 

Evolution of Large Language Models

Large language models have seen significant advancements in recent years as the field of natural language processing keeps growing. Many closed-source and open-source models are being published for researchers and developers to advance the AI field. These LLMs are performing exceptionally well on general tasks answering a wide range of queries but to make these models personalized and achieve greater accuracy on specific tasks we need to fine-tune these models.

Fine-tuning transforms pre-trained models into context-specific models by adapting domain-specific training with custom datasets. Fine-tuning requires a dedicated dataset to train LLMs and then deploy them on the server for certain use cases. Along with fine-tuning it is also crucial to evaluate these models to measure their effectiveness and on a variety of domain-related tasks that businesses might intend to do.

MonsterAPI helps developers and businesses in fine-tuning and evaluation using the ‘llm_eval’ engine. MonsterAPI has designed no-code as well as code-based fine-tuning APIs that simplify the entire process. The following are the benefits of Monster API:

  • Automating configuring GPU computing environments.
  • Optimises memory usage for finding the optimal batch size.
  • Set up model configurations manually for business-specific requirements.
  • Integrates model experiment tracking using WandB.
  • Integration of evaluation engine to test model performance against benchmarks.

What is LLM Fine-tuning and How Does it Work?

Fine-tuning is a technique to train the custom dataset on pre-trained LLM for a specific task. It modifies the parameters of pre-trained LLM to evolve into task-specific LLM by leveraging a vast amount of general knowledge of pre-trained LLM. Fine-tuning is done through the following process:

Fine-Tune Large Language Models with MonsterAPI
  • Pre-trained model selection: Firstly, businesses need to find suitable pre-trained models from various available models such as Llama, SDXL, Claude, Gemma etc. depending upon their needs.
  • Dataset preparation: Gather and collect the custom dataset specific to the task for which you’re training the LLM. Pre-process and structure the dataset in input-output format for parsing during the fine-tuning process. 
  • Model training: Once the dataset is prepared the pre-trained model is trained for a specific task. During this phase, model weights are adjusted based on new data, enabling it to learn new patterns from the customer dataset. Monster API helps in fine-tuning models with the help of highly optimised and cost-friendly GPUs. we will learn more about the process in-depth in the upcoming sections.
  • Hyperparameter tuning: The fine-tuning process also needs optimization of hyperparameters such as batch size, learning rate, training epochs, GPU configurations, etc.
  • Evaluation of Fine-tuned models: Once the model is trained we need to evaluate the model performance using metrics such as MMLU, GSM8k, truthfullqa, etc for the performance in production. Monster API provides an integrated evaluation API so that developers can test their models once it is fine-tuned on the custom dataset. We will learn more about LLM evaluation in the next section.

What is LLM Evaluation?

LLM evaluation means the assessment of fine-tuned models involving the performance and effectiveness of a targeted task that we want to Achieve. The evaluation ensures models meet the desired accuracy, coherency and consistency on the validation dataset.

A wide range of evaluation metrics, such as MMLU and GSM8k, test the performance of language models on validation datasets. Comparing these evaluations against benchmarks reveals areas for further improvement in model performance.

MonsterAPI provides a comprehensive LLM evaluation engine to test and assess the fine-tuned model. Evaluation API can be used as follows: 

import requests

url = "https://api.monsterapi.ai/v1/evaluation/llm"

payload = {
    "deployment_name": "Model_deployment_name",
    "basemodel_path": "mistralai/Mistral-7B-v0.1",
    "eval_engine": "lm_eval",
    "task": "gsm8k,hellaswag"
}
headers = {
    "accept": "application/json",
    "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

As seen in the above code snippet developed model name along with the model path, eval_engine, and evaluation metrics loaded into the POST request to fine-tune the model which results in a comprehensive report of model performance and evaluation. Now we will look at the step-by-step guide to fine-tune and evaluate models using MonsterAPI with code examples.

Step-by-Step Guide to LLM Fine-tuning and Evaluation Using Monster API

MonsterAPI LLM fine-tuner is 10X faster and more efficient with the lowest cost for fine-tuning models across its alternatives. It supports a wide range of models in text generation, code generation, speech-to-text and text-to-speech translation, and image generation for fine-tuning for specific tasks. In this guide, we will learn about the fine-tuning process for text generation models followed by the evaluation of models using Monster API llm eval engine. 

MonsterAPI uses a network of computing resources from NVIDIA A100 GPUs with RAMs ranging from 8GB to 80GB depending upon the size of models and hyperparameters configured. Let’s compare the time taken and cost of fine-tuning models with various platforms to choose the right platform for your product.

Platform/service provider Model Name Time taken Cost of fine-tuning 
MonsterAPI  Falcon-7B 27min 26s  $5-6
MonsterAPI Llama-7B 115 mins $6
MosaicML MPT-7B-Instruct 2.3 Hours $37
Valohai Mistral-7B 3 hours $1.5
Mistral Mistral-7B 2-3 hours $4
monsterAPI

Step1: Setup Environment and Install Relevant Libraries

Before we begin fine-tuning the large language model, we need to install the necessary libraries and set up the Monster API key for launching a fine-tuning job by initialising the MonsterAPI client. Sign up on MonsterAPI to get the FREE API key for your project (SignUp). In the below code snippet, we have set up a project environment for our fine-tuning process.

!pip install monsterapi==1.0.8

import os
from monsterapi import client as mclient
import json
import logging
import requests
import os
import huggingface_hub as hf_hub
from huggingface_hub import HfApi, hf_hub_download, file_exists

# Add monster API key over here
os.environ['MONSTER_API_KEY'] = 'YOUR_MONSTER_API_KEY'
client = mclient(api_key=os.environ.get("MONSTER_API_KEY"))

Step2: Prepare the Payload and Launch Fine-tuning Job

Once the project environment is set, we set up a launch payload that consists of the base model path, LoRA parameters, data source path, and training details such as epochs, learning rates etc. for our fine-tuning job. Once the fine-tuning launch payload is ready we call the Monster API client to run the process and get the fine-tuned model without hassle. In the below code snippet, we have set up a launch payload for our fine-tuning job.

# prepare a launchpad 
launch_payload = {
    "pretrainedmodel_config": {
        "model_path": "huggyllama/llama-7b",
        "use_lora": True,
        "lora_r": 8,
        "lora_alpha": 16,
        "lora_dropout": 0,
        "lora_bias": "none",
        "use_quantization": False,
        "use_gradient_checkpointing": False,
        "parallelization": "nmp"
    },
    "data_config": {
        "data_path": "tatsu-lab/alpaca",
        "data_subset": "default",
        "data_source_type": "hub_link",
        "prompt_template": "Here is an example on how to use 
        tatsu-lab/alpaca dataset 
        ### Input: {instruction} ### Output: {output}",
        "cutoff_len": 512,
        "prevalidated": False
    },
    "training_config": {
        "early_stopping_patience": 5,
        "num_train_epochs": 1,
        "gradient_accumulation_steps": 1,
        "warmup_steps": 50,
        "learning_rate": 0.001,
        "lr_scheduler_type": "reduce_lr_on_plateau",
        "group_by_length": False
    },
    "logging_config": { "use_wandb": False }
}

# finetune the service using configured params
ret = client.finetune(service="llm", params=launch_payload)
deployment_id = ret.get("deployment_id")
print(ret)

In the above code, we have the following key configurations for fine-tuning the pre-trained model on a custom dataset.

  • Pretrainedmodel_config: It takes pre-trained model paths like llama-7B and LoRA parameters like lora_r, lora_alpha and lora_dropout which will be the base model for our fine-tuning dataset. Llama-7B is trained using optimized transformer architecture and it is efficient in language and text generation tasks.
  • Data_config: It takes a data source path which can be custom data or data source from hugging face hub with prompt template based on input and output structure.
  • Training_config: It takes training configurations like epochs, learning rate, and early stopping rate for specifying training parameters.

Step3: Fetch Fine-tuning Job Status and Job Logs

After the fine-tuning process which can take up to 5-10 minutes, we can confirm the model deployment status and can get model fine-tuning job logs for training process review. Check out our official website for more information on LLM fine-tuning here.

# Get deployment status
status_ret = client.get_deployment_status(deployment_id)
print(status_ret)

# Get deployment logs
logs_ret = client.get_deployment_logs(deployment_id)
print(logs_ret)

Step4: Evaluate Fine-tuned Model and Get Scores Using the LLM Evaluation Engine

Once the context-specific model is trained we evaluate the fine-tuned model using our platform’s llm evaluation API to test the accuracy model. Monster API offers a comprehensive report of model insights based on given evaluation metrics such as MMLU, gsm8k, hellaswag, arc, and truthfulqa alike. In the below code, we assign a payload to the evaluation API that evaluates the deployed model and returns the metrics and report from the result URL.

import requests
base_model = launch_payload['pretrainedmodel_config']['model_path']
lora_model_path = status_ret['info']['model_url']


# evaluation api URL
url = "https://api.monsterapi.ai/v1/evaluation/llm"

payload = {
    "eval_engine": "lm_eval",
    "basemodel_path": base_model,
    "loramodel_path": lora_model_path,
    "task": "mmlu"
}
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": f"Bearer {os.environ['MONSTER_API_KEY']}"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)
# Extracting deployment ID from response
response_data = response.json()
serving_params = response_data.get("servingParams", {})
eval_deployment_id = serving_params.get("deployment_id")

# Get deployment logs
logs_ret = client.get_deployment_status(eval_deployment_id)
print(logs_ret)

result_url = logs_ret["info"]["result_url"]

response = requests.get(result_url)
result_json = response.json()

print(result_json)
# Extract required values from the JSON
Evaluation_Metrics = {
    "MMLU": result_json["results"]["mmlu"]["acc,none"]
}
print(Evaluation_Metrics)

The above code evaluates the fine-tuned model with the ‘lm_eval’ engine on the MMLU evaluation metric using monster APIs. To learn more about the evaluation of models check out the API page here.

Conclusion

Fine-tuning LLMs significantly enhances their performance for specific tasks, and evaluating these models is crucial to ensure their effectiveness and reliability. Our MonsterAPI platform offers robust tools for fine-tuning and evaluation, streamlining the process and offering precise performance metrics. By leveraging MonsterAPI’s LLM evaluation engine, developers can achieve high-quality, specialized language models with confidence, ensuring they meet the desired standards and perform optimally in real-world applications for their context and domain. Thus, the MonsterAPI platform provides state of the art solution for fine-tuning and evaluation with a comprehensive report to develop custom models with few lines of code.

Key Takeaways

  • We learnt comprehensive insights on the LLM fine-tuning process from model selection to fine-tuned model evaluation by using MonsterAPI’s easy to use platform.
  • Automated GPU configuration for optimised model training and performance measurement with code examples.
  • We learned the hands-on code walkthrough to fine-tune and evaluate the large language model that can be applied for custom datasets.

Frequently Asked Questions

Q1. What is the fine-tuning and evaluation of LLMs?

A. Fine-tuning is a process of adapting pre-trained weights of the models to a customer dataset of domain-specific tasks and queries. Evaluation is process of assessing the accuracy of models against industry benchmarks to ensure high quality model development.

Q2. How does MonsterAPI help in fine-tuning large language models?

A. MonsterAPI helps with hosted APIs for fine-tuning and evaluation of LLMs with low costs and optimized computing resources.

Q3. What types of datasets are supported for fine-tuning LLMs?

A. Datasets such a text, codebases, images, and videos are used in fine-tuning models based on selection of base model for fine-tuning process. 

I specialize in data science and machine learning with hands-on experience in working on various end-to-end data science projects. I am the chapter co-lead of the Mumbai local chapter of Omdena. I am also a kaggle master and educator ambassador at streamlit with volunteers around the world.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details