Ludwig: A Comprehensive Guide to LLM Fine Tuning using LoRA

Nitin Aggarwal Last Updated : 08 May, 2024

8 min read

Introduction to Ludwig

The development of Natural Language Machines (NLP) and Artificial Intelligence (AI) has significantly impacted the field. These models can understand and generate human-like text, enabling applications like chatbots and document summarization. However, to fully utilize their capabilities, they need to be fine-tuned for specific use cases. Ludwig, a low-code framework, is designed for creating custom AI models, including LLMs and deep neural networks. This article provides a comprehensive guide to fine-tuning LLMs using Ludwig, focusing on creating state-of-the-art models for real-world scenarios.

Learning Outcomes

Understand the significance of fine-tuning Natural Language Machines (NLP) and Artificial Intelligence (AI) models for specific use cases.
Learn about Ludwig, a low-code framework designed for creating custom AI models, including Large Language Models (LLMs) and deep neural networks.
Explore Ludwig’s key features, including training, fine-tuning, hyperparameter optimization, model visualization, and deployment.
Gain proficiency in preparing for LLM fine-tuning, including environment setup, data preparation, and YAML configuration.
Master the steps involved in fine-tuning LLMs using Ludwig, including model training, evaluation, and deployment.
Understand how to extend and adapt the fine-tuning process for various NLP tasks beyond instruction tuning, showcasing the flexibility of the Ludwig framework.

This article was published as a part of the Data Science Blogathon.

Introduction to Ludwig
Understanding Ludwig: A Low Code Framework For LLM Fine Tuning
Preparing for Fine-Tuning
Detailed Steps for Fine-Tuning LLMs with Ludwig
Deploy the Fine-tuned Model to HuggingFace
Extending and Adapting the Fine-Tuning Process
Conclusion

Understanding Ludwig: A Low Code Framework For LLM Fine Tuning

Ludwig, known for its user-friendly, low-code approach, supports a wide array of machine learning (ML) and deep learning applications. This flexibility makes it an ideal choice for developers and researchers aiming to build custom AI models without deep programming requirements. Ludwig’s capabilities include but are not limited to training, fine-tuning, hyperparameter optimization, model visualization, and deployment.

Key Features of Ludwig

Training and Fine-Tuning: Ludwig supports a range of training paradigms, including full training and fine-tuning of pre-trained models.
Model Configuration: Utilizing YAML files for configuration, Ludwig allows detailed specification of model parameters, making it highly customizable and flexible.
Hyperparameter Tuning: Ludwig integrates tools for automatic hyperparameter optimization, enhancing model performance.
Explainable AI: Tools within Ludwig provide insights into model decisions, promoting transparency.
Model Serving and Benchmarking: Ludwig makes it easy to serve models and benchmark their performance under different conditions.

Preparing for Fine-Tuning

Before we start, let’s get familiar with Ludwig and its ecosystem. As introduced earlier, Ludwig is a low-code framework for building custom AI models, like Large Language Models and other Deep neural networks. Technically, Ludwig can be used for training and finetuning any Neural Network and support wide range of Machine Learning and Deep Learning use-cases. Ludwig also has support for visualizations, hyperparameter tuning, explainable AI, model benchmarking as well as model serving.

It utilizes yaml file where all the configurations are to be specified like, model name, type of task to be performed, number of epochs to run in case of finetuning, hyperparameter for training and finetuning, quantization configurations etc. Ludwig supports wide range of LLM focused tasks like Zero-shot batch inference, RAG, Adapter-based finetuning for text generation, instruction tuning etc. In this article, we will fine-tune Mistral 7B model to follow human instructions. We will also explore how to define a yaml configuration for Ludwig.

It’s critical to understand the prerequisites and the setup required:

Environment Setup: Installing the necessary software and packages.
Data Preparation: Selecting and preprocessing the appropriate datasets.
YAML Configuration: Defining model parameters and training options in a YAML file.
Model Training and Evaluation: Executing the fine-tuning and assessing model performance.

Detailed Steps for Fine-Tuning LLMs with Ludwig

Setting Up the Development Environment: Please note that I’ve VSCode environment for running this code. But it can be run on Kaggle notebook environment, Jupyter Servers as well as Google Colab.

Step1: Install Necessary Packages

Execute if you get the Transformers version runtime error.

%pip install ludwig==0.10.0 ludwig[llm] 
%pip install torch==2.1.2 
%pip install PyYAML==6.0 
%pip install datasets==2.18.0 
%pip install pandas==2.1.4
%pip install transformers==4.30.2

Step2: Import Necessary Libraries and Dependencies

import yaml
import logging
import torch
import datasets
import pandas as pd
from ludwig.api import LudwigModel

Step3: Data Preparation and Pre-Processing

For this guide, we will use the Alpaca dataset from Stanford, specifically designed for instruction-based fine-tuning of LLMs. The dataset, created using OpenAI’s text-davinci-003 engine, comprises 52,000 entries with columns for instructions, corresponding tasks, and LLM outputs.

We’ll focus on the first 5,000 rows to manage computational demands efficiently. The dataset is accessed and loaded into a pandas dataframe through Hugging Face’s dataset library.

data = datasets.load_dataset("tatsu-lab/alpaca")
df = pd.DataFrame(data["train"])
df = df[["instruction", "input", "output"]]
df.head()

Step4: Create YAML Configuration

Create a YAML configuration file named model.yaml to set up a model for fine-tuning using Ludwig. The configuration includes:

Model Type: Identified as an LLM.

Base Model: Uses ‘mistralai/Mistral-7B-Instruct-v0.1’ from Hugging Face’s repository, although local model checkpoints can also be specified.
Input and Output Features: Defines ‘instruction’ and ‘output’ as text types for handling dataset inputs and model outputs respectively.
Prompt Template: Specifies how the model should format its responses based on the given instruction and input from the dataset.
Input and Output Features: Defines ‘instruction’ and ‘output’ as text types for handling dataset inputs and model outputs respectively.
Prompt Template: Specifies how the model should format its responses based on the given instruction and input from the dataset.
Text Generation Parameters: Sets the temperature to 0.1 for randomness in response generation and max_new_tokens to 64, balancing response completeness and training efficiency.
Adapter and Quantization: Utilizes the LoRA adapter and 4-bit quantization to manage model size and computational efficiency.
Data Preprocessing: Sets global_max_sequence_length to 512 to standardize the length of input tokens and uses a random split for training and validation datasets with specific probabilities.
Trainer Settings: Configures the model to fine-tune for one epoch using a batch size of 1, with a paged_adam optimizer and a cosine learning rate scheduler, including a warmup phase.

This YAML configuration organizes and specifies all necessary parameters for effective model training and fine-tuning. For additional customization, refer to Ludwig’s documentation.

Define Setting Inline Within YAML File

Below is an example of how to define these settings inline within the YAML file:

import os
import logging
from ludwig.api import LudwigModel

# Set your Hugging Face authentication token here
hugging_face_token = <your_huggingface_api_token>
os.environ["HUGGING_FACE_HUB_TOKEN"] = hugging_face_token

qlora_fine_tuning_config = yaml.safe_load(
"""
model_type: llm
base_model: mistralai/Mistral-7B-Instruct-v0.2

input_features:
  - name: instruction
    type: text

output_features:
  - name: output
    type: text

prompt:
  template: >-
    Below is an instruction that describes a task, paired with an input
    that provides further context. Write a response that appropriately
    completes the request.

    ### Instruction: {instruction}

    ### Input: {input}

    ### Response:

generation:
  temperature: 0.1
  max_new_tokens: 64

adapter:
  type: lora

quantization:
  bits: 4

preprocessing:
  global_max_sequence_length: 512
  split:
    type: random
    probabilities:
    - 0.95
    - 0
    - 0.05

trainer:
  type: finetune
  epochs: 1 # Typically, you want to set this to 3 epochs for instruction fine-tuning
  batch_size: 1
  eval_batch_size: 2
  optimizer:
    type: paged_adam
  gradient_accumulation_steps: 16
  learning_rate: 0.0004
  learning_rate_scheduler:
    decay: cosine
    warmup_fraction: 0.03
"""
)

Step5: LLM Fine Tuning with LoRA (Low Rank Adaptation)

To begin the training, all we need to do is call the model’s object by passing the yaml configuration defined previously as an argument to the model object and a logger to track the finetuning! And then we call the train function model.train().

Install the following transformers runtime if you get an error:

%pip install transformers==4.30.2

model = LudwigModel(
  config=qlora_fine_tuning_config, 
  logging_level=logging.INFO
  )

results = model.train(dataset=df[:5000])

In just 2 lines, we have initialized our LLM finetuning and we have taken only the first 5000 rows for sake of compute time, memory and speed! Here, I used Kaggle’s GPU P100 as a performance accelerator which you can as well pick up for boosting the finetuning speed and performance!

Step6: Evaluating the Model’s Performance

test_examples = pd.DataFrame([
    {
        "instruction": "Name two famous authors from the 18th century.",
        "input": "",
    },
    {
        "instruction": "Develop a list of possible outcomes of given scenario",
        "input": "A fire has broken out in an old abandoned factory.",
    },
    {
        "instruction": "Tell me what you know about mountain ranges.",
        "input": "",
    },
    {
        "instruction": "Compose a haiku describing the summer.",
        "input": "",
    },
    {
        "instruction": "Analyze the given legal document and explain the 
        key points.",
        "input": 'The following is an excerpt from a contract between 
        two parties, labeled "Company A" and "Company B": \n\n"Company A 
        agrees to provide reasonable assistance to Company B in ensuring 
        the accuracy of the financial statements it provides. 
        This includes allowing Company A reasonable access to personnel and 
        other documents which may be necessary for Company B’s review. 
        Company B agrees to maintain the document provided by 
        Company A in confidence, and will not disclose the information 
        to any third parties without Company A’s explicit permission.',
    },
])

predictions = model.predict(test_examples, generation_config={
"max_new_tokens": 64, 
"temperature": 0.1})[0]

for input_with_prediction in zip(
test_examples['instruction'], 
test_examples['input'], 
predictions['output_response']
):
    
    print(f"Instruction: {input_with_prediction[0]}")
    print(f"Input: {input_with_prediction[1]}")
    print(f"Generated Output: {input_with_prediction[2][0]}")
    print("\n\n")

Deploy the Fine-tuned Model to HuggingFace

Let us now deploy the fine-tuned model to HuggingFace. Follow the below steps:

Step1: Create a Model Repository on Hugging Face

Navigate to the Hugging Face website and log in
Click on your profile icon and select “New Model.”
Fill in the necessary details and specify a name for your model.

Step2: Generate a Hugging Face API Key

Still on the Hugging Face website, click your profile icon, then go to “Settings.”
Select “Access Tokens” and click on “New Token.”
Choose “Write” access when generating the token

Step3: Authenticate with Hugging Face CLI

Open your command line interface
Use the following command to log in, replacing <API_KEY> with your generated API key

huggingface-cli login --token <API_KEY>

Step4: Upload Your Model to Hugging Face

Use the command below, replacing <repo-id> with your model repository ID and <model-path> with the local path to your saved mod

ludwig upload hf_hub --repo_id <repo-id> --model_path <model-path>

Extending and Adapting the Fine-Tuning Process

This section expands on how the fine-tuning process can be adapted and extended for various applications, showcasing the flexibility and robustness of the Ludwig framework.

The code and configurations provided can be adapted to a wide range of NLP tasks beyond instruction tuning. Here’s how you can modify the process:

Data Source Flexibility: Adjust the data preparation step to incorporate different datasets as needed for your specific task.

# Huggingface datasets and tokenizers
from datasets import load_dataset
from tokenizers import Tokenizer
from tokenizers.models import WordLevel
from tokenizers.trainers import WordLevelTrainer
from tokenizers.pre_tokenizers import Whitespace

Task Customization: Modify the YAML configuration to reflect the new task requirements by changing the input and output features and adapting the prompt template as necessary.
Model Selection and Adaptation: Choose a different base model from Hugging Face’s model repository that better suits the new task, adjusting the model parameters accordingly.
Hyperparameter Optimization: Utilize Ludwig’s built-in tools for hyperparameter tuning to optimize the model further based on the new task’s specific needs.

Conclusion

Ludwig’s low-code framework offers a streamlined pathway for fine-tuning Large Language Models (LLMs) to specific tasks, combining ease of use with powerful customization options. By utilizing Ludwig’s comprehensive feature set for model development, training, and evaluation, developers can create robust, high-performance AI models that are tailored to meet the demands of a wide array of real-world applications.

Key Takeaways

Ludwig is a low-code framework designed for creating custom AI models, including Large Language Models (LLMs) and deep neural networks, making AI development more accessible to developers and researchers.
Fine-tuning LLMs using Ludwig involves steps such as environment setup, data preparation, YAML configuration, model training, evaluation, and deployment.
Ludwig offers key features such as training, fine-tuning, hyperparameter optimization, model visualization, and deployment, providing a comprehensive solution for AI model development.
By leveraging Ludwig’s capabilities, developers can create robust and high-performance AI models tailored to specific use cases, such as document summarization, chatbots, and instruction-based tasks.
The flexibility of Ludwig allows for the adaptation and extension of the fine-tuning process to various NLP tasks beyond instruction tuning, ensuring versatility in AI model development.

References and Further Reading

Hugging Face Repo
Git Repo
Ludwig AI Framework Documentation:
Hugging Face Base Model
Stanford Alpaca Dataset

This extended guide provides a detailed walkthrough of the LLM fine-tuning process using Ludwig, covering both technical details and practical applications to ensure developers and researchers can fully leverage this powerful framework for their AI model development endeavors.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Nitin Aggarwal

I am a technology enthusiast assisting healthcare and life sciences clients with their Data and AI strategies by leveraging Databricks, the Data Intelligence Platform.

I would like to achieve better population health outcomes leveraging GenAI tools and technologies.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Ludwig: A Comprehensive Guide to LLM Fine Tuning using LoRA

Introduction to Ludwig

Learning Outcomes

Table of contents

Understanding Ludwig: A Low Code Framework For LLM Fine Tuning

Key Features of Ludwig

Preparing for Fine-Tuning

Detailed Steps for Fine-Tuning LLMs with Ludwig

Step1: Install Necessary Packages

Step2: Import Necessary Libraries and Dependencies

Step3: Data Preparation and Pre-Processing

Step4: Create YAML Configuration

Define Setting Inline Within YAML File

Step5: LLM Fine Tuning with LoRA (Low Rank Adaptation)

Step6: Evaluating the Model’s Performance

Deploy the Fine-tuned Model to HuggingFace

Step1: Create a Model Repository on Hugging Face

Step2: Generate a Hugging Face API Key

Step3: Authenticate with Hugging Face CLI

Step4: Upload Your Model to Hugging Face

Extending and Adapting the Fine-Tuning Process

Conclusion

Key Takeaways

References and Further Reading

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap