Have you ever thought about how to make communication easier for people who use a mix of Hindi and English, commonly known as Hinglish? With the growing use of Hinglish in everyday conversations, social media, and advertising, there’s a need for tools that can accurately translate between English and Hinglish. This is where advanced language models like Gemma 2 9B come into play. By fine-tuning this model, we can create solutions that understand the unique blend of Hindi and English, making communication more effective for a wider audience.
This article was published as a part of the Data Science Blogathon.
Gemma 2 models represent a significant advancement in artificial intelligence, offering powerful language processing capabilities with a focus on efficiency and accessibility. These models are designed to excel in tasks such as text generation, code writing, and problem-solving. With their compact size and robust performance, Gemma 2 models provide a versatile tool for developers and users alike. They are particularly noted for their competitive performance relative to larger models.
Fine-tuning the multilingual Gemma 2 9B model can be highly beneficial for Hindi translations due to its robust multilingual capabilities and adaptability.
Unsloth AI, founded in 2023 and based in San Francisco, is an innovative startup revolutionizing the fine-tuning and training of large language models (LLMs). With a focus on speed and efficiency, Unsloth’s platform enables model training up to 30 times faster while using 90% less memory compared to traditional methods. This is achieved through advanced software optimizations, such as handwritten GPU kernels, rather than relying on hardware upgrades. The company embraces an open-source approach, boasting over 8 million monthly downloads and 29,000 GitHub stars. By making AI training more accessible and cost-effective, Unsloth AI caters to developers and enterprises alike, fostering a collaborative and inclusive AI ecosystem.
Unsloth speeds up LLM training using several techniques. It manually derives backpropagation steps, like manual autograd, for faster gradient calculations. And optimizes chained matrix multiplications and builds custom, more efficient kernels known as Triton language kernels. It also uses Flash Attention to focus on critical input data. Along with other memory-efficient strategies, these enhance training speed and efficiency.
In the following tutorial, we fine tune the multilingual Gemma 2 9B on a Hinglish Dataset leveraging the Unsloth AI library on Google Colab using T4 GPU. We save the fine tuned model in Hugging Face and then query the model for different inputs through Ollama. Post this, we explore how the fine tuned model helps in more accurate English to Hinglish translations.
We will first install necessary libraries below:
!pip install unsloth
The code below loads the pre-trained Gemma 2 9B language model using the unsloth library. It sets configuration options like a maximum sequence length of 2048 tokens and enables 4-bit quantization to reduce memory usage. The data type (dtype) is auto-detected, and the model and tokenizer are loaded for use in further language processing tasks. This setup optimizes memory efficiency while working with large language models.
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = (
None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gemma-2-9b",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit)
For Adding LoRA Adapters, we only need to update 1 to 10% of all parameters. The code below utilizes the FastLanguageModel.get_peft_model function to adapt a model using LoRA (Low-Rank Adaptation) techniques. It specifies parameters such as the rank (r = 16), target modules for adaptation, and optimization settings like lora_alpha and bias.
The code also enables “unsloth” for efficient memory usage and sets a random state for reproducibility.
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
The code below defines a prompt formatting function for preparing training data in a structured format. It starts by creating a template (alpaca_prompt) that includes placeholders for the instruction, input, and output. The formatting_prompts_func function takes in a batch of examples, extracts the English (en) and Hinglish (hi_ng) text, and formats them into the defined template. It adds an EOS_TOKEN (End-of-Sequence token) at the end of each formatted prompt to prevent the model from generating responses indefinitely. The final output is a dictionary with the formatted text for each example, ready for model training or fine-tuning.
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
instructions = ["Translate English to Hinglish"]
inputs = examples["en"]
outputs = examples['hi_ng']
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts, }
The code below prepares the dataset in the correct format, with each entry consisting of a properly structured instruction-input-output prompt for Hinglish translation tasks.
from datasets import load_dataset
from datasets import Dataset, DatasetDict
dataset = load_dataset("nateraw/english-to-hinglish", split = "train")
dataset= dataset.remove_columns(["source"])
df_pandas = dataset.to_pandas()
def apply_format(col1,col2):
instruction = "Translate English to Hinglish"
text = alpaca_prompt.format(instruction, col1, col2) + EOS_TOKEN
return text
df_pandas['text'] = df_pandas.apply(lambda e:apply_format(e['en'],e['hi_ng']),axis=1)
df_pandas.drop(['en','hi_ng'],axis=1,inplace=True)
dataset = Dataset.from_pandas(df_pandas)
The code below initializes an SFTTrainer for fine-tuning a model using the trl library. It sets up training parameters such as batch size, gradient accumulation steps, and learning rate within TrainingArguments. The trainer also configures logging and optimization settings, including the use of mixed precision (fp16 or bf16) based on hardware support. The training process is optimized with an AdamW optimizer and a linear learning rate scheduler.
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 60,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
#LOGGING ARGUMENTS
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
)
trainer_stats = trainer.train()
The code below sets up inference for the fine-tuned model using FastLanguageModel. It first prepares a prompt (alpaca_prompt) for translation from English to Hinglish by formatting it with an example input. The prompt is tokenized and transferred to a GPU (cuda) for efficient computation. The model then generates a response with a maximum of 64 new tokens, and the output is decoded back into text. Finally, it extracts the part of the output after the “### Response:” section, which contains the generated Hinglish translation.
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"Translate English to Hinglish", # instruction
"remind me to get eggs today", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
output = tokenizer.batch_decode(outputs)
output[0].split("### Response:\n")[1]
Output
'mujhe aaj eggs lene ke liye yaad dilaayen<eos>'
The following code is for saving the trained model and pushing it to Hugging Face Hub. You would need to give it the HF token for writing to the Hub.
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub("mimidutta007/english_to_hinglish_FTgemma2", token = "") # Online saving
tokenizer.push_to_hub("mimidutta007/english_to_hinglish_FTgemma2", token = "") # Online saving
You can find the model here. I have also converted it to GGUF format so that we can query the model through ollama as well.
Learn how to interact with the fine-tuned Gemma 2 9B model using Ollama, enabling seamless English-to-Hinglish translations through efficient API queries.
This code installs the Ollama software and the langchain-ollama library, which allows interaction with language models via Ollama. It then starts Ollama as a background subprocess (subprocess.Popen) to run in a non-blocking manner. After waiting for 3 seconds (time.sleep(3)), the code pulls a fine-tuned model (english_to_hinglish_FTgemma2) from Ollama using the ollama pull command. This setup enables the model to be used for English-to-Hinglish translation tasks.
#Installing Ollama and langchain-ollama library
!curl -fsSL https://ollama.com/install.sh | sh
!pip install langchain-ollama
#Starting a subprocess so that ollama can be run in a non blocking manner
import subprocess
subprocess.Popen(["ollama", "serve"])
import time
time.sleep(3)
#Pulling the Model
!ollama pull hf.co/mimidutta007/english_to_hinglish_FTgemma2
This code sets up a prompt template using langchain for an English-to-Hinglish translation task. It defines a template that includes placeholders for the instruction and input, then creates a ChatPromptTemplate from it. The model (OllamaLLM) is instantiated with a fine-tuned Hinglish translation model. The prompt and model are combined in a chain. The input data is passed to the chain, generating a translation response.
The result is then displayed in Markdown format.
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown
# Define the template
template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{Instruction}
### Input:
{Input}
### Response:
"""
# Create a prompt template
prompt = ChatPromptTemplate.from_template(template)
# Instantiate the model
model = OllamaLLM(model="hf.co/mimidutta007/english_to_hinglish_FTgemma2")
# Chain the prompt and model
chain = prompt | model
input_data = {
"Instruction": "Translate from English to Hinglish",
"Input": "are there any roads closed in the area due to construction"
}
# Invoke the chain with input data and display the response in Markdown format
response = chain.invoke(input_data)
Output
'kya area ke kisi road par construction ki wajah se band hai'
“Input”: “please text Joanne Brennan that I will be five minutes late.”
Output
'Joanne Brenan ko message karo ke main 5 minutes late hoon'
“Input”: “remind me to get eggs today”
Output
'mujhe aaj eggs lene ke liye yaad dilaayen<eos>'
“Input”: “Rotate the Image 90 degrees to the right”
Output
'Image ko 90 degree tak apne Right me rotate kare'
As we can see from the outputs above, they are translated well into Hinglish. Some words are kept in English, while others are converted to Hindi. This suggests that the model, after training, understands Hinglish grammar and can generate accurate outputs.
Lets also check what the original Gemma 2 9B model outputs for an input:
Input | Output From Original Gemma 2 9B | Output From Fine-tuned Gemma 2 9B |
---|---|---|
play some motivating music | Karo Yaar Kuch Motivational Gaane | kuch motivate karne wala music bajao |
play some motivating music | Chal yaar, kucch pumping gaane bajao | kuch motivate karne wali music play kare |
remind me to get eggs today | Yaar, mujhe aaj ande lene ka yaad dila de! | mujhe aaj eggs lene ke liye yaad dilaayen |
please text Joanne Brennan that I will be five | Bhai Joanne Brennan ko msg kar de ki main 5 minute late ho jaunga. | Joanne Brennan ko message karo ki main 5 minutes late ho |
Request you to please give feedback on comment section | Kya comment section mein kya chal raha hai? Bata de feedback do! | Mujhe comment section par apne feedback dene ki request hai |
The development of LLM models for Hinglish translation is crucial for bridging the gap between formal languages and the hybrid dialect commonly used in India’s everyday communication. Fine-tuning the multilingual Gemma 2 9B model offers significant advantages, especially with its efficiency, multilingual strengths, and adaptability to Hinglish’s unique nuances. This approach not only enhances translation accuracy but also facilitates better communication in personal and professional contexts. With the support of Unsloth AI’s innovative fine-tuning capabilities, this model can revolutionize Hinglish translation and improve engagement across diverse audiences.
A. Hinglish, a blend of Hindi and English, is widely used in informal communication in India, especially on social media, in advertising, and in daily conversations. Developing LLM models for Hinglish translation helps businesses and individuals effectively communicate with a broader audience, improving engagement and bridging the gap between formal and colloquial language.
A. The Gemma 2 9B model is a powerful language processing tool with 9 billion parameters, offering robust performance across multilingual tasks. Its compact size, high efficiency, and adaptability make it an ideal candidate for fine-tuning on Hinglish datasets, improving translation accuracy and capturing Hinglish’s unique syntax and cultural nuances.
A. Fine-tuning the Gemma 2 9B model using curated Hinglish datasets allows the model to adapt to the language’s distinct syntax, grammar, and vocabulary. This customization ensures more accurate and culturally relevant translations from English to Hinglish, improving communication in both personal and professional contexts.
A. Unsloth AI offers significant advantages by enabling faster training (up to 30 times faster) while using 90% less memory than traditional methods. This platform makes the fine-tuning process more efficient, cost-effective, and accessible, helping developers create highly specialized language models with fewer resources.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.