Harnessing NLP Superpowers: A Step-by-Step Hugging Face Fine Tuning Tutorial

kajal Last Updated : 07 Jan, 2025

11 min read

Introduction

Fine-tuning a natural language processing (NLP) model entails altering the model’s hyperparameters and architecture and typically adjusting the dataset to enhance the model’s performance on a given task. You can achieve this by adjusting the learning rate, the number of layers in the model, the size of the embeddings, and various other parameters. Fine-tuning is a time-consuming procedure that demands a firm grasp of the model and the job. This article will look at how to fine-tune a Hugging Face Model.

A Step-by-Step Hugging Face Fine Tuning Tutorial

Learning Objectives

Understand the T5 model’s structure, including Transformers and self-attention.
Learn to optimize hyperparameters for better model performance.
Master text data preparation, including tokenization and formatting.
Know how to adapt pre-trained models to specific tasks.
Learn to clean, split, and create datasets for training.
Gain experience in model training and evaluation using metrics like loss and accuracy.
Explore real-world applications of the fine-tuned model for generating responses or answers.

This article was published as a part of the Data Science Blogathon.

About Hugging Face Models
Import Necessary Libraries
Import Dataset
Problem Statement
Initialize Parameters
T5 Transformer
T5Tokenizer
Dataset Preparation
DataLoader
Model Building
Model Training
Model Prediction
Prediction
Frequently Asked Questions

About Hugging Face Models

Hugging Face is a firm that provides a platform for natural language processing (NLP) model training and deployment. The platform hosts a model library suitable for various NLP tasks, including language translation, text generation, and question-answering. These models undergo training on extensive datasets and are designed to excel in a wide range of natural language processing (NLP) activities.

The Hugging Face platform also includes tools for fine tuning pre-trained models on specific datasets, which can help adapt algorithms to particular domains or languages. The platform also has APIs for accessing and utilizing pre-trained models in apps and tools for constructing bespoke models and delivering them to the cloud.

Using the Hugging Face library for natural language processing (NLP) tasks has various advantages:

Wide selection of models: A significant range of pre-trained NLP models are available through the Hugging Face library, including models trained on tasks such as language translation, question answering, and text categorization. This makes it simple to choose a model that meets your exact requirements.
Compatibility across platforms: The Hugging Face library is compatible with standard deep learning systems such as TensorFlow, PyTorch, and Keras, making it simple to integrate into your existing workflow.
Simple fine-tuning: The Hugging Face library contains tools for fine-tuning pre-trained models on your dataset, saving you time and effort over training a model from scratch.
Active community: The Hugging Face library has a vast and active user community, which means you can obtain assistance and support and contribute to the library’s growth.
Well-documented: The Hugging Face library contains extensive documentation, making it easy to start and learn how to use it efficiently.

Import Necessary Libraries

Importing necessary libraries is analogous to constructing a toolkit for a particular programming and data analysis activity. These libraries, which are frequently pre-written collections of code, offer a wide range of functions and tools that help to speed development. Developers and data scientists can access new capabilities, increase productivity, and use existing solutions by importing the appropriate libraries.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


import torch

from transformers import T5Tokenizer
from transformers import T5ForConditionalGeneration, AdamW

import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint

pl.seed_everything(100)

import warnings
warnings.filterwarnings("ignore")

Import Dataset

Importing a dataset is a crucial initial step in data-driven projects.

df = pd.read_csv("/kaggle/input/queestion-answer-dataset-qa/train.csv")
df.columns

df = df[['context','question', 'text']]
print("Number of records: ", df.shape[0])

Problem Statement

“To create a model capable of generating responses based on context and questions.”

For example,

Context = “Clustering groups of similar cases, for example, can
find similar patients or use for customer segmentation in the
banking field. The association technique is used for finding items or events
that often co-occur, for example, grocery items that a particular customer usually buys together. Anomaly detection is used to discover abnormal
and unusual cases; for example, credit card fraud
detection.”

Question = “What is the example of Anomaly detection?”

Answer = ????????????????????????????????

df["context"] = df["context"].str.lower()
df["question"] = df["question"].str.lower()
df["text"] = df["text"].str.lower()

df.head()

Initialize Parameters

input length: During training, we refer to the number of input tokens (e.g., words or characters) in a single example fed into the model as input length. If you’re training a language model to predict the next word in a sentence, the input length would be the number of words in the phrase.
Output length: During training, the model is expected to generate a specific quantity of output tokens, such as words or characters, in a single sample. The output length corresponds to the number of words the model predicts within the sentence.
Training batch size: During training, the model processes several samples at once. If you set the training batch size to 32, the model handles 32 instances, such as 32 phrases, simultaneously before updating its model weights.
Validating batch size: Similar to the training batch size, this parameter indicates the number of instances that the model handles during the validation phase. In other words, it represents the volume of data the model processes when it is tested on a hold-out dataset.
Epochs: An epoch is a single trip through the complete training dataset. So, if the training dataset comprises 1000 instances and the training batch size is 32, one epoch will need 32 training steps. If the model is trained for ten epochs, it will have processed ten thousand instances (10 * 1000 = ten thousand).

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 
INPUT_MAX_LEN = 512 # Input length
OUT_MAX_LEN = 128 # Output Length
TRAIN_BATCH_SIZE = 8 # Training Batch Size
VALID_BATCH_SIZE = 2 # Validation Batch Size
EPOCHS = 5 # Number of Iteration

T5 Transformer

The T5 model is based on the Transformer architecture, a neural network designed to handle sequential input data effectively. It comprises an encoder and a decoder, which include a sequence of interconnected “layers.”

The encoder and decoder layers comprise various “attention” mechanisms and “feedforward” networks. The attention mechanisms enable the model to focus on different sections of the input sequence at other times. At the same time, the feedforward networks alter the input data using a set of weights and biases.

The T5 model also employs “self-attention,” which allows each element in the input sequence to pay attention to every other element. This allows the model to recognize links between words and phrases in the input data, which is critical for many NLP applications.

In addition to the encoder and decoder, the T5 model contains a “language model head,” which predicts the next word in a sequence based on the prior words. This is critical for translation and text production jobs, where the model must provide cohesive and natural-sounding output.

The T5 model represents a large and sophisticated neural network designed for highly efficient and accurate processing of sequential input. It has undergone extensive training on a diverse text dataset and can proficiently perform a broad spectrum of natural language processing tasks.

T5Tokenizer

T5Tokenizer is used to turn a text into a list of tokens, each representing a single word or punctuation mark. The tokenizer additionally inserts unique tokens into the input text to denote the text’s start and end and distinguish various phrases.

The T5Tokenizer employs a combination of character-level and word-level tokenization and a subword-level tokenization strategy comparable to the SentencePiece tokenizer. It subwords the input text based on the frequency of each character or character sequence in the training data. This assists the tokenizer in dealing with out-of-vocabulary (OOV) terms that do not occur in the training data but do appear in the test data.

The T5Tokenizer additionally inserts unique tokens into the text to denote the start and end of sentences and to divide them. It adds the tokens s > and / s >, for example, to signify the beginning and end of a phrase, and pad > to indicate padding.

MODEL_NAME = "t5-base"

tokenizer = T5Tokenizer.from_pretrained(MODEL_NAME, model_max_length= INPUT_MAX_LEN)

print("eos_token: {} and id: {}".format(tokenizer.eos_token,
                   tokenizer.eos_token_id)) # End of token (eos_token)
print("unk_token: {} and id: {}".format(tokenizer.unk_token,
                   tokenizer.eos_token_id)) # Unknown token (unk_token)
print("pad_token: {} and id: {}".format(tokenizer.pad_token,
                 tokenizer.eos_token_id)) # Pad token (pad_token)

Dataset Preparation

When dealing with PyTorch, you usually prepare your data for use with the model by using a dataset class. The dataset class is responsible for loading data from the disc and executing required preparation procedures, such as tokenization and numericalization. The class should also implement the getitem function, which is used to obtain a single item from the dataset by index.

The init method populates the dataset with the text list, label list, and tokenizer. The len function returns the number of samples in the dataset. The get item function returns a single item from a dataset by index. It accepts an index idx and outputs the tokenized input and labels.

It is also customary to include various preprocessing steps, such as padding and truncating the tokenized inputs. You may also turn the labels into tensors.

class T5Dataset:

    def __init__(self, context, question, target):
        self.context = context
        self.question = question
        self.target = target
        self.tokenizer = tokenizer
        self.input_max_len = INPUT_MAX_LEN
        self.out_max_len = OUT_MAX_LEN

    def __len__(self):
        return len(self.context)

    def __getitem__(self, item):
        context = str(self.context[item])
        context = " ".join(context.split())

        question = str(self.question[item])
        question = " ".join(question.split())

        target = str(self.target[item])
        target = " ".join(target.split())
        
        
        inputs_encoding = self.tokenizer(
            context,
            question,
            add_special_tokens=True,
            max_length=self.input_max_len,
            padding = 'max_length',
            truncation='only_first',
            return_attention_mask=True,
            return_tensors="pt"
        )
        

        output_encoding = self.tokenizer(
            target,
            None,
            add_special_tokens=True,
            max_length=self.out_max_len,
            padding = 'max_length',
            truncation= True,
            return_attention_mask=True,
            return_tensors="pt"
        )


        inputs_ids = inputs_encoding["input_ids"].flatten()
        attention_mask = inputs_encoding["attention_mask"].flatten()
        labels = output_encoding["input_ids"]

        labels[labels == 0] = -100  # As per T5 Documentation

        labels = labels.flatten()

        out = {
            "context": context,
            "question": question,
            "answer": target,
            "inputs_ids": inputs_ids,
            "attention_mask": attention_mask,
            "targets": labels
        }


        return out

DataLoader

The DataLoader class loads data in parallel and batches, making it possible to work with big datasets that would otherwise be too vast to store in memory. Combining the DataLoader class with a dataset class containing the data to be loaded.

The dataloader is in charge of iterating over the dataset and returning a batch of data to the model for training or assessment while training a transformer model. The DataLoader class offers various parameters to control the loading and preprocessing of data, including batch size, worker thread count, and whether to shuffle the data before each epoch.

class T5DatasetModule(pl.LightningDataModule):

    def __init__(self, df_train, df_valid):
        super().__init__()
        self.df_train = df_train
        self.df_valid = df_valid
        self.tokenizer = tokenizer
        self.input_max_len = INPUT_MAX_LEN
        self.out_max_len = OUT_MAX_LEN


    def setup(self, stage=None):

        self.train_dataset = T5Dataset(
        context=self.df_train.context.values,
        question=self.df_train.question.values,
        target=self.df_train.text.values
        )

        self.valid_dataset = T5Dataset(
        context=self.df_valid.context.values,
        question=self.df_valid.question.values,
        target=self.df_valid.text.values
        )

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
         self.train_dataset,
         batch_size= TRAIN_BATCH_SIZE,
         shuffle=True, 
         num_workers=4
        )


    def val_dataloader(self):
        return torch.utils.data.DataLoader(
         self.valid_dataset,
         batch_size= VALID_BATCH_SIZE,
         num_workers=1
        )

Model Building

When creating a transformer model in PyTorch, you usually begin by creating a new class that derives from the torch. nn.Module. This class describes the model’s architecture, including the layers and the forward function. The class’s init function defines the model’s architecture, often by instantiating the model’s different levels and assigning them as class attributes.

The forward method is in charge of passing data through the model in the forward direction. This method accepts input data and applies the model’s layers to create the output. The forward method should implement the model’s logic, such as passing input through a sequence of layers and returning the result.

The class’s init function creates an embedding layer, a transformer layer, and a fully connected layer and assigns these as class attributes. The forward method accepts the incoming data x, processes it via the given stages, and returns the result. When training a transformer model, the training process typically involves two stages: training and validation.

The training_step method specifies the rationale for carrying out a single training step, which generally includes:

forward pass through the model
computing the loss
computing gradients
Updating the model’s parameters

The val_step method, like the training_step method, is used to assess the model on a validation set. It usually includes:

forward pass through the model
computing the evaluation metrics

class T5Model(pl.LightningModule):
    
    def __init__(self):
        super().__init__()
        self.model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME, return_dict=True)

    def forward(self, input_ids, attention_mask, labels=None):

        output = self.model(
            input_ids=input_ids, 
            attention_mask=attention_mask, 
            labels=labels
        )

        return output.loss, output.logits


    def training_step(self, batch, batch_idx):

        input_ids = batch["inputs_ids"]
        attention_mask = batch["attention_mask"]
        labels= batch["targets"]
        loss, outputs = self(input_ids, attention_mask, labels)

        
        self.log("train_loss", loss, prog_bar=True, logger=True)

        return loss

    def validation_step(self, batch, batch_idx):
        input_ids = batch["inputs_ids"]
        attention_mask = batch["attention_mask"]
        labels= batch["targets"]
        loss, outputs = self(input_ids, attention_mask, labels)

        self.log("val_loss", loss, prog_bar=True, logger=True)
        
        return loss


    def configure_optimizers(self):
        return AdamW(self.parameters(), lr=0.0001)

Model Training

Iterating over the dataset in batches, sending the input through the model, and changing the model’s parameters based on the calculated gradients and a set of optimization criteria is usual for training a transformer model.

def run():
    
    df_train, df_valid = train_test_split(
        df[0:10000], test_size=0.2, random_state=101
    )
    
    df_train = df_train.fillna("none")
    df_valid = df_valid.fillna("none")
    
    df_train['context'] = df_train['context'].apply(lambda x: " ".join(x.split()))
    df_valid['context'] = df_valid['context'].apply(lambda x: " ".join(x.split()))
    
    df_train['text'] = df_train['text'].apply(lambda x: " ".join(x.split()))
    df_valid['text'] = df_valid['text'].apply(lambda x: " ".join(x.split()))
    
    df_train['question'] = df_train['question'].apply(lambda x: " ".join(x.split()))
    df_valid['question'] = df_valid['question'].apply(lambda x: " ".join(x.split()))

   
    df_train = df_train.reset_index(drop=True)
    df_valid = df_valid.reset_index(drop=True)
    
    dataModule = T5DatasetModule(df_train, df_valid)
    dataModule.setup()

    device = DEVICE
    models = T5Model()
    models.to(device)

    checkpoint_callback  = ModelCheckpoint(
        dirpath="/kaggle/working",
        filename="best_checkpoint",
        save_top_k=2,
        verbose=True,
        monitor="val_loss",
        mode="min"
    )

    trainer = pl.Trainer(
        callbacks = checkpoint_callback,
        max_epochs= EPOCHS,
        gpus=1,
        accelerator="gpu"
    )

    trainer.fit(models, dataModule)

run()

Model Prediction

To make predictions with a fine-tuned NLP model like T5 using new input, you can follow these steps:

Preprocess the New Input: Tokenize and preprocess your new input text to match the preprocessing you applied to your training data. Ensure that it is in the correct format expected by the model.
Use the Fine-Tuned Model for Inference: Load your fine-tuned T5 model, which you previously trained or loaded from a checkpoint.
Generate Predictions: Pass the preprocessed new input to the model for prediction. In the case of T5, you can use the generate method to generate responses.

train_model = T5Model.load_from_checkpoint("/kaggle/working/best_checkpoint-v1.ckpt")

train_model.freeze()

def generate_question(context, question):

    inputs_encoding =  tokenizer(
        context,
        question,
        add_special_tokens=True,
        max_length= INPUT_MAX_LEN,
        padding = 'max_length',
        truncation='only_first',
        return_attention_mask=True,
        return_tensors="pt"
        )

    
    generate_ids = train_model.model.generate(
        input_ids = inputs_encoding["input_ids"],
        attention_mask = inputs_encoding["attention_mask"],
        max_length = INPUT_MAX_LEN,
        num_beams = 4,
        num_return_sequences = 1,
        no_repeat_ngram_size=2,
        early_stopping=True,
        )

    preds = [
        tokenizer.decode(gen_id,
        skip_special_tokens=True, 
        clean_up_tokenization_spaces=True)
        for gen_id in generate_ids
    ]

    return "".join(preds)

Prediction

let’s generate a prediction using the fine-tuned T5 model with new input:

context = “Clustering groups of similar cases, for example, \
can find similar patients, or use for customer segmentation in the \
banking field. Using association technique for finding items or events that \
often co-occur, for example, grocery items that are usually bought together\
by a particular customer. Using anomaly detection to discover abnormal \
and unusual cases, for example, credit card fraud detection.”

que = “what is the example of Anomaly detection?”

print(generate_question(context, que))

context = "Classification is used when your target is categorical,\
 while regression is used when your target variable\
is continuous. Both classification and regression belong to the category \
of supervised machine learning algorithms."

que = "When is classification used?"

print(generate_question(context, que))

Conclusion

In this article, we embarked on a journey to fine-tune a natural language processing (NLP) model, specifically the T5 model, for a question-answering task. Throughout this process, we delved into various NLP model development and deployment aspects.

Key takeaways:

Explored the encoder-decoder structure and self-attention mechanisms that underpin its capabilities.
The art of hyperparameter tuning is an essential skill for optimizing model performance.
Experimenting with learning rates, batch sizes, and model sizes allowed us to fine-tune the model effectively.
Proficient in tokenization, padding, and converting raw text data into a suitable format for model input.
Delved into fine-tuning, including loading pre-trained weights, modifying model layers, and adapting them to specific tasks.
Learned how to clean and structure data, splitting it into training and validation sets.
Demonstrated how it could generate responses or answers based on input context and questions, showcasing its real-world utility.

Frequently Asked Questions

Q1. What is fine-tuning in natural language processing (NLP)?

Answer: Fine-tuning in NLP involves modifying a pre-trained model’s hyperparameters and architecture to optimize its performance for a specific task or dataset.

Q2. What is the Transformer architecture used in NLP models like T5?

Answer: The Transformer architecture is a neural network architecture. It excels at handling sequential data and is the foundation for models like T5. It uses self-attention mechanisms for context understanding.

Q3. What is the purpose of the encoder-decoder structure in models like T5?

Answer: In sequence-to-sequence tasks in NLP, we use the encoder-decoder structure. The encoder processes input data, and the decoder generates output data.

Q4. Is it possible to utilize fine-tuned NLP models such as T5 in real-world applications?

Answer: Yes, you can apply fine-tuned models to various real-world NLP tasks, including text generation, translation, and question-answering.

Q5. How can I start fine-tuning NLP models such as T5?

Answer: To begin, you can explore libraries such as Hugging Face. These libraries offer pre-trained models and tools for fine-tuning your datasets. Learning NLP fundamentals and deep learning concepts is also crucial.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

kajal

Hi, I am Kajal Kumari. have completed my Master’s from IIT(ISM) Dhanbad in Computer Science & Engineering. As of now, I am working as Machine Learning Engineer in Hyderabad.
hope that you have enjoyed the article. If you like it, share it with your friends also. Please feel free to comment if you have any thoughts that can improve my article writing.

If you want to read my previous blogs, you can read Previous Data Science Blog posts here. Connect with me

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Harnessing NLP Superpowers: A Step-by-Step Hugging Face Fine Tuning Tutorial

Introduction

Table of contents

About Hugging Face Models

Import Necessary Libraries

Import Dataset

Problem Statement

Initialize Parameters

T5 Transformer

T5Tokenizer

Dataset Preparation

DataLoader

Model Building

Model Training

Model Prediction

Prediction

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp