Adapting BERT for downstream tasks entails utilizing the pre-trained BERT model and customizing it for a particular task by adding a layer on top and training it on the target task. This technique allows the model to learn dependent on the task details from the data used for training while drawing on the knowledge of broad language expression of the pre-trained BERT model. Use the hugging face transformers package in Python to fine-tune BERT. Describe your training data, incorporating input text and labels. Fine-tuning the pre-trained BERT model for downstream tasks according to your data using the fit() function from the BertForSequenceClassification class.
This article was published as a part of the Data Science Blogathon.
Fine-tuning BERT adapts a pre-trained model with training data from the desired job to a specific downstream task by training a new layer. This process empowers the model to gain task-specific knowledge and enhance its performance on the target task.
1: Utilize the hugging face transformers library to load the pre-trained BERT model and tokenizer.
import torch
# Choose the appropriate device based on availability (CUDA or CPU)
gpu_available = torch.cuda.is_available()
device = torch.device("cuda" if gpu_available else "cpu")
# Utilize a different tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
# Load the model using a custom function
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
model.to(device)
2: Specify the training data for the specific target task, encompassing the input text and their corresponding labels
# Specify the input text and the corresponding labels
input_text = "This is a sample input text"
labels = [1]
3: Utilize the BERT tokenizer to tokenize the input text.
# Tokenize the input text
input_ids = torch.tensor(tokenizer.encode(input_text)).unsqueeze(0)
4: Put the model in training mode.
# Set the model to training mode
model.train()
Step 5: For obtaining fine-tuning of the pre-trained BERT model, we use the method of BertForSequenceClassification class. it includes training a new layer of pre-trained BERT model with the target task’s training data.
# Set up your dataset, batch size, and other training hyperparameters
dataset_train = ...
lot_size = 32
num_epochs = 3
learning_rate = 2e-5
# Create the data loader for the training set
train_dataloader = torch.
utils.data.
DataLoader(dataset_train,
batch_size=lot_size)
model.fit(train_dataloader, num_epochs=num_epochs, learning_rate=learning_rate)
Step 6: Investigate the fine-tuned BERT model’s illustration on the specific target task.
# Switch the model to evaluation mode
model.eval()
# Calculate the logits (unnormalized probabilities) for the input text
with torch.no_grad():
logits = model(input_ids)
# Use the logits to generate predictions for the input text
predictions = logits.argmax(dim=-1)
accuracy = ...
These represent the primary steps involved in fine-tuning BERT for a downstream task. You can utilize this as a foundation and customize it according to your specific use case.
Fine-tuning BERT enables the model to acquire task-specific information, enhancing its performance on the target task. It proves particularly valuable when the target task involves a relatively small dataset, as fine-tuning with the small dataset allows the model to learn task-specific information that might not be attainable from the pre-trained BERT model alone.
During fine-tuning, solely the weights of the supplementary layer appended to the pre-trained BERT model undergo updates. The weights of the pre-trained BERT model remain fixed. Thus only the added layer experiences modifications throughout the fine-tuning process.
Typically, the attached layer functions as a classification layer proceeds the pre-trained BERT model results, and generates logits for each class in the end task. The target task’s training data trains the added layer, enabling it to acquire task-specific information and improve the model’s performance on the target task.
To sum up, during fine-tuning, the added layer above the pre-trained BERT model undergoes modifications. The pre-trained BERT model maintains fixed weights. Thus, only the added layer is subject to updates during the training process.
Downstream tasks include a variety of natural language processing (NLP) operations that use pre-trained language reconstruction models such as BERT. Several examples of these tasks are below.
Text classification involves the assignment of a text to predefined categories or labels. For instance, one can train a text classification model to categorize movie reviews as positive or negative.
Use the BertForSequenceClassification library to alter BERT for text classification. This class uses input data, such as words or paragraphs, to generate logits for every class.
Natural language inference, also called recognizing textual entailment (RTE), determines the relationship between a given premise text and a hypothesis text. To adapt BERT for natural language inference, you can use the BertForSequenceClassification class provided by the hugging face transformers library. This class accepts a pair of premise and hypothesis texts as input and produces logits (unnormalized probabilities) for each of the three classes (entailment, contradiction, and neutral) as output.
The Named Entity Recognition process includes finding and dividing items defined in the text, such as people and Locations. The hugging face transformers library provides the BertForTokenClassification class to fine-tune BERT for named entity recognition. The provided class takes the input text and generates logits for each token in the input text, indicating the token’s class.
Answering questions involves generating a response in human language based on the given context. To fine-tune BERT for question answering, you can use the BertForQuestionAnswering class offered by the hugging face transformers library. This class takes both a context and a question as input and provides the start and end indices of the answer within the context as output.
Researchers continuously explore novel ways to utilize BERT and other language representation models in various NLP tasks. Pre-trained language representation models like BERT enable the accomplishment of various downstream tasks, such as the above examples. Apply fine-tuned BERT models to numerous other NLP tasks as well.
When BERT is fine-tuned, a pre-trained BERT model is arranged to a particular job or domain by updating its bounds using a limited amount of labeled data. For example, fine-tuning requires a dataset containing texts and their respective sentiment labels when utilizing BERT for sentiment analysis. This typically entails incorporating a task-specific layer atop the BERT encoder and training the entire model end-to-end, employing an appropriate loss function and optimizer.
A. Fine-tuning involves training specific parameters or layers of a pre-existing model checkpoint with labeled data from a specific task. This checkpoint is usually a model pre-trained on vast amounts of text data using unsupervised masked language modeling (MLM).
A. During the fine-tuning step, we adjust the already trained BERT model to a specific downstream task by putting a new layer on top of the previously trained model and training it using training data from the target task. This enables the model to acquire task-specific knowledge and enhance its performance on the target task.
A. Yes, it increases the model’s accuracy. It comprises using a model that has already been trained and retraining it using data pertinent to the original goal.
A. Due to the Bidirectional Capabilities of BERT, BERT undergoes pre-training on two different NLP tasks: Next Sentence Prediction and Masked Language Modeling.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.