The development of Natural Language Machines (NLP) and Artificial Intelligence (AI) has significantly impacted the field. These models can understand and generate human-like text, enabling applications like chatbots and document summarization. However, to fully utilize their capabilities, they need to be fine-tuned for specific use cases. Ludwig, a low-code framework, is designed for creating custom AI models, including LLMs and deep neural networks. This article provides a comprehensive guide to fine-tuning LLMs using Ludwig, focusing on creating state-of-the-art models for real-world scenarios.
This article was published as a part of the Data Science Blogathon.
Ludwig, known for its user-friendly, low-code approach, supports a wide array of machine learning (ML) and deep learning applications. This flexibility makes it an ideal choice for developers and researchers aiming to build custom AI models without deep programming requirements. Ludwig’s capabilities include but are not limited to training, fine-tuning, hyperparameter optimization, model visualization, and deployment.
Before we start, let’s get familiar with Ludwig and its ecosystem. As introduced earlier, Ludwig is a low-code framework for building custom AI models, like Large Language Models and other Deep neural networks. Technically, Ludwig can be used for training and finetuning any Neural Network and support wide range of Machine Learning and Deep Learning use-cases. Ludwig also has support for visualizations, hyperparameter tuning, explainable AI, model benchmarking as well as model serving.
It utilizes yaml file where all the configurations are to be specified like, model name, type of task to be performed, number of epochs to run in case of finetuning, hyperparameter for training and finetuning, quantization configurations etc. Ludwig supports wide range of LLM focused tasks like Zero-shot batch inference, RAG, Adapter-based finetuning for text generation, instruction tuning etc. In this article, we will fine-tune Mistral 7B model to follow human instructions. We will also explore how to define a yaml configuration for Ludwig.
It’s critical to understand the prerequisites and the setup required:
Setting Up the Development Environment: Please note that I’ve VSCode environment for running this code. But it can be run on Kaggle notebook environment, Jupyter Servers as well as Google Colab.
Execute if you get the Transformers version runtime error.
%pip install ludwig==0.10.0 ludwig[llm]
%pip install torch==2.1.2
%pip install PyYAML==6.0
%pip install datasets==2.18.0
%pip install pandas==2.1.4
%pip install transformers==4.30.2
import yaml
import logging
import torch
import datasets
import pandas as pd
from ludwig.api import LudwigModel
For this guide, we will use the Alpaca dataset from Stanford, specifically designed for instruction-based fine-tuning of LLMs. The dataset, created using OpenAI’s text-davinci-003 engine, comprises 52,000 entries with columns for instructions, corresponding tasks, and LLM outputs.
We’ll focus on the first 5,000 rows to manage computational demands efficiently. The dataset is accessed and loaded into a pandas dataframe through Hugging Face’s dataset library.
data = datasets.load_dataset("tatsu-lab/alpaca")
df = pd.DataFrame(data["train"])
df = df[["instruction", "input", "output"]]
df.head()
Create a YAML configuration file named model.yaml to set up a model for fine-tuning using Ludwig. The configuration includes:
Model Type: Identified as an LLM.
This YAML configuration organizes and specifies all necessary parameters for effective model training and fine-tuning. For additional customization, refer to Ludwig’s documentation.
Below is an example of how to define these settings inline within the YAML file:
import os
import logging
from ludwig.api import LudwigModel
# Set your Hugging Face authentication token here
hugging_face_token = <your_huggingface_api_token>
os.environ["HUGGING_FACE_HUB_TOKEN"] = hugging_face_token
qlora_fine_tuning_config = yaml.safe_load(
"""
model_type: llm
base_model: mistralai/Mistral-7B-Instruct-v0.2
input_features:
- name: instruction
type: text
output_features:
- name: output
type: text
prompt:
template: >-
Below is an instruction that describes a task, paired with an input
that provides further context. Write a response that appropriately
completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
generation:
temperature: 0.1
max_new_tokens: 64
adapter:
type: lora
quantization:
bits: 4
preprocessing:
global_max_sequence_length: 512
split:
type: random
probabilities:
- 0.95
- 0
- 0.05
trainer:
type: finetune
epochs: 1 # Typically, you want to set this to 3 epochs for instruction fine-tuning
batch_size: 1
eval_batch_size: 2
optimizer:
type: paged_adam
gradient_accumulation_steps: 16
learning_rate: 0.0004
learning_rate_scheduler:
decay: cosine
warmup_fraction: 0.03
"""
)
To begin the training, all we need to do is call the model’s object by passing the yaml configuration defined previously as an argument to the model object and a logger to track the finetuning! And then we call the train function model.train().
Install the following transformers runtime if you get an error:
%pip install transformers==4.30.2
model = LudwigModel(
config=qlora_fine_tuning_config,
logging_level=logging.INFO
)
results = model.train(dataset=df[:5000])
In just 2 lines, we have initialized our LLM finetuning and we have taken only the first 5000 rows for sake of compute time, memory and speed! Here, I used Kaggle’s GPU P100 as a performance accelerator which you can as well pick up for boosting the finetuning speed and performance!
test_examples = pd.DataFrame([
{
"instruction": "Name two famous authors from the 18th century.",
"input": "",
},
{
"instruction": "Develop a list of possible outcomes of given scenario",
"input": "A fire has broken out in an old abandoned factory.",
},
{
"instruction": "Tell me what you know about mountain ranges.",
"input": "",
},
{
"instruction": "Compose a haiku describing the summer.",
"input": "",
},
{
"instruction": "Analyze the given legal document and explain the
key points.",
"input": 'The following is an excerpt from a contract between
two parties, labeled "Company A" and "Company B": \n\n"Company A
agrees to provide reasonable assistance to Company B in ensuring
the accuracy of the financial statements it provides.
This includes allowing Company A reasonable access to personnel and
other documents which may be necessary for Company B’s review.
Company B agrees to maintain the document provided by
Company A in confidence, and will not disclose the information
to any third parties without Company A’s explicit permission.',
},
])
predictions = model.predict(test_examples, generation_config={
"max_new_tokens": 64,
"temperature": 0.1})[0]
for input_with_prediction in zip(
test_examples['instruction'],
test_examples['input'],
predictions['output_response']
):
print(f"Instruction: {input_with_prediction[0]}")
print(f"Input: {input_with_prediction[1]}")
print(f"Generated Output: {input_with_prediction[2][0]}")
print("\n\n")
Let us now deploy the fine-tuned model to HuggingFace. Follow the below steps:
huggingface-cli login --token <API_KEY>
Use the command below, replacing <repo-id> with your model repository ID and <model-path> with the local path to your saved mod
ludwig upload hf_hub --repo_id <repo-id> --model_path <model-path>
This section expands on how the fine-tuning process can be adapted and extended for various applications, showcasing the flexibility and robustness of the Ludwig framework.
The code and configurations provided can be adapted to a wide range of NLP tasks beyond instruction tuning. Here’s how you can modify the process:
# Huggingface datasets and tokenizers
from datasets import load_dataset
from tokenizers import Tokenizer
from tokenizers.models import WordLevel
from tokenizers.trainers import WordLevelTrainer
from tokenizers.pre_tokenizers import Whitespace
Ludwig’s low-code framework offers a streamlined pathway for fine-tuning Large Language Models (LLMs) to specific tasks, combining ease of use with powerful customization options. By utilizing Ludwig’s comprehensive feature set for model development, training, and evaluation, developers can create robust, high-performance AI models that are tailored to meet the demands of a wide array of real-world applications.
This extended guide provides a detailed walkthrough of the LLM fine-tuning process using Ludwig, covering both technical details and practical applications to ensure developers and researchers can fully leverage this powerful framework for their AI model development endeavors.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Very Well explained.