Training an Adapter for RoBERTa Model for Sequence Classification Task

Drishti Last Updated : 12 Apr, 2023

13 min read

Introduction

The current trend in NLP includes downloading and fine-tuning pre-trained models with millions or even billions of parameters. However, storing and sharing such large trained models is time-consuming, slow, and expensive. These constraints hinder the development of more multi-purpose and adaptable NLP techniques with the RoBERTa model that can learn from and for multiple tasks; in this article, we will be focusing on the sequence classification tasks. Considering this, adapters were proposed, which are small, lightweight, and parameter-efficient alternatives to full fine-tuning. They are basically small bottleneck layers that can be dynamically added with a pre-trained model based on different tasks and languages.

RoBERTa Model training

In this article, we will train an adapter for ROBERTa model on the Amazon polarity dataset for sequence classification tasks with the help of adapter-transformers, the AdapterHub adaptation of Hugging Face’s transformers library. Additionally, we will compare the performance of the adapter module to a fully fine-tuned RoBERTa model trained on the same dataset.

By the end of this article, you will have learned the following:

How to train an adapter for the RoBERTa model on the Amazon Polarity dataset for the Sequence Classification task?
How can a trained adapter with the Hugging Face pipeline be used to help make quick predictions?
How to extract the adapter from the trained model and save it for later use?
How can the base model’s weights be restored to their original form by deactivating and deleting the adapter?
Push the trained model to the Hugging Face hub for later use. Additionally, we will see the comparison between the adapters and full fine-tuning.

This article was published as a part of the Data Science Blogathon.

Project Description

This project includes training a task adapter for the RoBERTa model on the Amazon polarity dataset for sequence classification tasks, specifically sentiment analysis. To train, we will use the RoBERTa base model from the Hugging Face hub and the AdapterHub adaptation of Hugging Face’s transformers library. Additionally, we will compare the performance of the adapter module to a fully fine-tuned RoBERTa model trained on the same dataset.

What are Adapters?

Adapters are lightweight alternatives to fully fine-tuned pre-trained models. Currently, adapters are implemented as small feedforward neural networks that are inserted between layers of a pre-trained model. They provide a parameter-efficient, computationally efficient, and modular approach to transfer learning. The following image shows added adapter.

Source: Adapterhub

During training, all the weights of the pre-trained model are frozen such that only the adapter weights are updated, resulting in modular knowledge representations. They can be easily extracted, interchanged, independently distributed, and dynamically plugged into a language model. These properties highlight the potential of adapters in advancing the NLP field astronomically.

Significance of Adapters in NLP Transfer Learning

The following are some important points regarding the significance of adapters in NLP transfer learning:

Efficient Use of Pretrained Models: Pretrained language models such as BERT, GPT-2, and RoBERTa have been proven effective in various NLP tasks. However, fine-tuning the entire model can be computationally expensive and time-consuming. Adapters allow for more efficient use of these pretrained models by enabling the insertion of task-specific functionality without modifying the original architecture.
Improved Adaptability: Adapters allow for greater flexibility in adapting pretrained models to new tasks. Rather than fine-tuning the entire model, adapters enable selective modification of specific layers, improving model adaptation to new tasks and leading to better performance.
Cost-Effective: Adapters can be trained with fewer data than required for training a full model, reducing the cost of training and improving the model’s scalability.
Reduced Memory Requirements: Since adapters require fewer parameters than a full model, they can be easily added to a pre-existing model without requiring significant additional memory.
Transfer Learning Across Languages: Adapters can also enable knowledge transfer across languages, allowing models to be trained on a source language and then adapted to a target language with minimal additional training. And hence they can also prove to be very effective in low-resource settings.

Overview of the RoBERTa Model

Roberta is a large pre-trained language model developed by Facebook AI and released in 2019. It shares the same architecture as the BERT model. It is a revised version of BERT with minor adjustments to the key hyperparameters and embeddings.

Except for the output layers, BERT’s pre-training and fine-tuning procedures use the same architecture. The pre-trained model parameters are utilized to initialize models for various downstream tasks, and during fine-tuning, all parameters are adjusted. The following diagram illustrates BERT’s pre-training and fine-tuning procedures. The following figure shows the BERT Architecture.

Source: Arxiv

In contrast, RoBERTa does not employ the next-sentence pretraining objective but utilizes much larger mini-batches and learning rates during training. RoBERTa adopts a different pretraining method and replaces the byte-level BPE tokenizer (similar to GPT-2) with a character-level BPE vocabulary. Moreover, RoBERTa uses “dynamic masking,” which helps the model learn more robust representations of the input text by forcing it to predict a diverse set of tokens rather than just predicting a fixed subset of tokens.

In this article, we will train an adapter for RoBERTa base model for the sequence classification task (more precisely, sentiment analysis). Simply put, a sequence classification task is a task that involves assigning a label or category to a sequence of words or tokens, such as a sentence or document.

Overview of the Dataset

We will use the Amazon Reviews Polarity dataset constructed by Xiang Zhang. This dataset was created by classifying reviews with scores of 1 and 2 as negative and reviews with scores of 4 and 5 as positive. Moreover, the samples with a score of 3 were ignored. Each class has 1,800,000 training samples and 200,000 testing samples.

Training the Adapter for RoBERTa Model on Amazon Polarity Dataset

To start we will begin with installing the libraries:

!pip install -U adapter-transformers datasets

And now, we will load the Amazon Reviews Polarity dataset using the HuggingFace dataset:

from datasets import load_dataset

#Loading the dataset
dataset = load_dataset("amazon_polarity")

Now let’s see what our dataset consists of:

dataset

Output: DatasetDict({
train: Dataset({
features: [‘label’, ‘title’, ‘content’],
num_rows: 3600000
})
test: Dataset({
features: [‘label’, ‘title’, ‘content’],
num_rows: 400000
})
})

So from the above output, we can see that the Amazon Reviews Polarity dataset consists of 3,600,000 training samples and 400,000 testing samples. Now let’s take a look at what a sample from the train set and test set looks like.

dataset["train"][0]

Output: {‘label’: 1, ‘title’: ‘Stunning even for the ‘non-gamer’, ‘content’: ‘This soundtrack was beautiful! It paints the scenery in your mind so good I would recommend it even to people who hate video game music! I have played the game Chrono Cross, but out of all of the games I have ever played, it has the best music! It backs away and takes a fresher step with great guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^’}

dataset["test"][0]

Output: {‘label’: 1, ‘title’: ‘Great CD’, ‘title’: ‘Great CD’, ‘content’: ‘My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and still LOVE IT. When I\’m in a good mood, it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. The vocals are just STUNNING, and the lyrics just kill. One of life\’s hidden gems. This is a desert island CD in my book. Why she never made it big is just beyond me. Every time I play this, no matter male or female, EVERYBODY says one thing “Who was that singing ?”‘}

From the output of print(dataset), dataset[“train”][0], and dataset[“test”][0], we can see that the dataset consists of three columns, i.e., “label”, “title”, and “content”. Considering this, we need to drop the column named title since we won’t require this to train the adapter.

#Removing the column "title" from the dataset
dataset = dataset.remove_columns("title")

Let’s check whether the column “title” has been dropped!

dataset

Below is a Screenshot showing the composition of the dataset after dropping the column “title”.

Output:

Fig. 3 Screenshot showing the composition of dataset after dropping the column

So clearly, the column “title” has been successfully dropped and no longer exists.

Now we will encode all the dataset samples. For this, we will use RobertaTokenizer and dataset.map() function for encoding the input data. Moreover, we will rename the target column class as “labels” since that is what a transformer model takes. Furthermore, we will use set_format() function to set the dataset format to be compatible with PyTorch.

from transformers import AutoTokenizer, RobertaTokenizer

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

#Encoding a batch of input data with the help of tokenizer
def encode_batch(batch):
  return tokenizer(batch["content"], max_length=100, truncation = True, padding="max_length")  
  
dataset = dataset.map(encode_batch, batched=True)

#Renaming the column "label" to "labels"
dataset = dataset.rename_column("label", "labels")

#Setting the dataset format to torch and mentioning the columns we want to format
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

Now, we will use RobertaModelWithHeads class, which is unique to adapter-transformers and allows us to easily add and configure prediction heads.

from transformers import RobertaConfig, RobertaModelWithHeads

#Defining the configuration for the model
config = RobertaConfig.from_pretrained("roberta-base", num_labels=2)

#Setting up the model
model = RobertaModelWithHeads.from_pretrained("roberta-base", config=config)

We will now add an adapter with the help of the add_adapter() method. For this, we will pass an adapter name; we passed “amazon_polarity”. Following this, we will also add a matching classification head. Lastly, we will activate the adapter and prediction head using train_adapter().

Basically, train_adapter() method performs two functions majorly:

It freezes all the weights of the pre-trained model such that only the adapter weights are updated during the training.
It also activates the adapter and prediction head to use both in every forward pass.

#Adding adapter to the RoBERTa model
model.add_adapter("amazon_polarity")

# Adding a matching classification head
model.add_classification_head(
    "amazon_polarity",
    num_labels=2,
    id2label={ 0: "negative", 1: "positive"}
  )
  
# Activating the adapter
model.train_adapter("amazon_polarity")

We will configure the training process with the help of TraniningArguments class. Following this, we will also write a function to calculate evaluation accuracy. Lastly, we will pass the arguments to the AdapterTrainer, a class optimized for only training adapters.

import numpy as np
from transformers import TrainingArguments, AdapterTrainer, EvalPrediction

training_args = TrainingArguments(
    learning_rate=3e-4,
    max_steps=80000,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=1000,
    output_dir="adapter-roberta-base-amazon-polarity",
    overwrite_output_dir=True,
    remove_unused_columns=False,
)

def compute_accuracy(eval_pred):
  preds = np.argmax(eval_pred.predictions, axis=1)
  return {"acc": (preds == eval_pred.label_ids).mean()}

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    compute_metrics=compute_accuracy,
)

Let’s start training now!

trainer.train()

Fig. 4 Image depicting the training run (Source: Author)

TrainOutput(global_step=80000, training_loss=0.13133217878341674, metrics={‘train_runtime’: 7884.1676, ‘train_samples_per_second’: 324.701, ‘train_steps_per_second’: 10.147, ‘total_flos’: 1.33836672e+17, ‘train_loss’: 0.13133217878341674, ‘epoch’: 0.71})

Evaluating the Trained Model

Now let’s evaluate the adapter’s performance on the dataset’s test split.

trainer.evaluate()

ROBERTa Model Evaluation | classification task

We can use the trained model with the help of the Hugging Face pipeline to make quick predictions.

from transformers import TextClassificationPipeline
classifier = TextClassificationPipeline(model=model,
                                        tokenizer=tokenizer,
                                        device=training_args.device.index)
                                        
classifier("I came across a lot of reviews stating that it is the best book out there.")#import csv

Output: [{‘label’: ‘positive’, ‘score’: 0.5589291453361511}]

Extracting and Saving the Adapter

Ultimately, we can also extract the adapter from the trained model and save it for later use. save_adapter() creates a file for saving adapter weights and adapter configuration.

model.save_adapter("./final_adapter", "amazon_polarity")

!ls -lh final_adapter

Fig. 7 The files present in final_adapter folder — Fig. 7 The files present in the final_adapter folder

Deactivating and Deleting the Adapter

Once we are done working with the adapters, and they are no longer needed, we can restore the weights of the base model in its original form by deactivating and deleting the adapter.

#Deactivating the adapter
model.set_active_adapters(None)

#Deleting the added adapter
model.delete_adapter("amazon_polarity")

Pushing the Trained Model to the Hub

We can also push the trained model to the Hugging Face hub for later use. For this, we will import the libraries and install git, and then we will push the model to the hub.

from huggingface_hub import notebook_login
notebook_login()

!apt install git-lfs 
!git config --global credential.helper store

trainer.push_to_hub()

Link to the Model Card: https://huggingface.co/DrishtiSharma/adapter-roberta-base-amazon-polarity

Comparison of Adapter with Full Fine-tuning

Since the finetuning of adapters involves only the updation of adapter parameters while the parameters of the pre-trained models are frozen, this greatly reduces the training time, computational cost of fine-tuning, and memory footprint of the adapter module when compared to full fine-tuning.
The adapter module can be easily integrated with the pre-trained models to adapt them to new tasks without the need to retrain the whole model. Notably, the size of the file, which contains adapter weights, is just 3.5 MB. Both of these aspects highlight its potential for ease of reusability for multiple tasks.
While trying to fine-tune the RoBERTa model on Amazon Review Polarity dataset, I ran into memory-related issues, which caused the training session to end abruptly at around 40k steps. This highlights the advantage of adapters, i.e., in scenarios where computational resources are limited; adapters are a lot more promising approach than full-fine-tuning.
To draw further conclusions, I trained the adapter and RoBERTa model on a smaller dataset, i.e., “Rotten Tomatoes”. I was pleasantly surprised that adapters scored better than the full fine-tuned model. Notably, after training the adapter for around 113 epochs, the eval_acc was 88.93%, and the model had started to overfit. On the other hand, when the RoBERTa model was trained for the same number of epochs, the eval_acc was 50%, and the train_loss and eval_loss were around 0.693, and these were still going down. Regardless, to draw a more fair and concrete conclusion, a lot more experiments need to be conducted.

Applications of the Trained Adapter

Following are some of the potential applications of an Adapter trained on the Amazon Polarity dataset for sequence classification tasks:

Social Media Analysis: The trained adapter can analyze the underlying sentiment in social media posts or comments. Businesses can use this to gauge customer sentiment and effectively respond to negative/constrictive feedback in time.
Customer Service: The trained adapter can be used to automatically classify the raised customer support tickets into positive or negative, allowing the support team to address and prioritize customer complaints more effectively and timely.
Product/Service Reviews: The trained adapter can automatically classify product/service reviews as positive or negative, helping businesses quickly gauge customer satisfaction with their offerings.
Market Research: The trained adapter can also be used for analyzing sentiment in customer feedback surveys, market research forms, etc., which can be further utilized to draw insights about customer sentiment toward their product/service/brand.
Brand Monitoring: The trained model can be used to monitor online mentions of a brand or product and classify them by sentiment, allowing businesses to track their online reputation and respond to negative feedback or complaints.

Pros of the Adapters

Adapters have several advantages over traditional methods. Here are some of the advantages of adapters in NLP:

Efficient Fine-tuning: Adapters can be fine-tuned on new tasks with fewer parameters than training an entire model from scratch.
Modular: Adapters are modular/interchangeable; they can be easily swapped or added to a pre-trained model.
Domain-specific Adaptations: Adapters can be fine-tuned on domain-specific tasks, resulting in better performance at those tasks.
Incremental Learning: Adapters can be used for incremental learning, allowing for efficient continuous learning and adapting the pre-trained model to new data.
Faster Training: Adapters can be trained faster than training the entire model from scratch, which helps in faster experimentation and prototyping.
Smaller Size: Adapters are significantly smaller than a fine-tuned model, allowing for faster inference and less memory consumption.

Cons of the Adapters

While adapters have several advantages, they have some disadvantages too. Here are some of the disadvantages of adapters:

Reduced Performance: Since an additional adapter layer is added on top of a pre-trained model, this can add computational overhead to the model and affect the model’s performance regarding inference speed and accuracy.
Increased Complexity: Again, as the adapters are added to a pre-trained model, the model must be modified to accept inputs and outputs from the adapter layer. This can, in turn, make the overall architecture of the model more complex.
Limited Expressiveness: Adapters are task-specific and may not be as expressive as a fully-trained model fine-tuned for certain tasks, especially for complex tasks or those requiring domain-specific knowledge.
Limited Transferability: Adapters are trained on limited task-specific data, which may not enable them to generalize well to new tasks or domains, reducing their usefulness when the task or domain differs from the one the adapter was trained on.
Potential for Overfitting: The experiments we performed in this article itself showed that the adapter started to overfit after certain steps, which can lead to poor performance on a downstream task.

Future Research Directions

Following are some of the potential research directions which can help in furthering the advanced development and usage of Adapters:

Exploring Different Adapter Architectures: Adapters are currently implemented as small feedforward neural networks inserted between layers of a pre-trained model. There is huge potential for exploring different architectures for adapters that may offer better performance for specific tasks. This could include investigating new methods for parameter sharing, designing adapters with multiple layers, exploring different activation functions, incorporating attention, etc.
Studying the Impact of Adapter Size: Larger adapters have been shown to work better than smaller ones. But there’s a caveat here the “largeness” of the model affects the inference speed and the computational cost/requirement. Hence further research could be done to explore the optimal size of the adapters based on specific tasks.
Investigating Multi-Layer Adapters: Currently, adapters are added to a single layer of a pre-trained model. There is a scope for exploring multi-layer adapters that can adapt multiple layers of a model for a given task.
Adapting to Other Modalities: Although adapters have been developed, studied, and tested primarily in the context of NLP, there is a scope for studying their use for other modalities like image, audio processing, etc.
Improving Efficiency and Scalability: The efficiency and scalability of adapter training could be improved much more than it currently is.
Multi-domain Adaptation and Multi-task Learning: Adapters have been shown to adapt to new domains and tasks quickly. Future research can help develop adapters that can simultaneously adapt to multiple domains.
Compression and Pruning with Adapters: The efficiency of the adapters can be further increased by developing methods for compressing or pruning adapters while maintaining their effectiveness.
Adapters for Reinforcement Learning: Investigating the use of adapters for reinforcement learning can enable agents to learn more quickly and effectively in complex environments.

Conclusion

This article presents how we can train an adapter model to alter the weights of a given pre-trained model based on the task at hand. And we also saw that once the task is complete, we can easily restore the weights of the base model in its original form by deactivating and deleting the adapter.

To summarize, the key takeaways from this article are:

Adapters are small bottleneck layers that can be dynamically added to a pre-trained model based on different tasks and languages.
We trained an adapter for the RoBERTa model on the Amazon polarity dataset for the sentiment classification task with the help of adapter-transformers, the AdapterHub adaptation of HuggingFace’s transformers library.
train_adapter() method freezes all the weights of the pre-trained model such that only the adapter weights are updated during the training. It also activates the adapter and prediction head to use both in every forward pass.
The adapter from the trained model can be extracted and saved for later use. save_adapter() creates a file for saving adapter weights and adapter configuration.
When the adapter is not needed, we can restore the weights of the base model in its original form by deactivating and deleting the adapter.
Adapters seemed to perform better than the fully fine-tuned RoBERTa model, but, to have a concrete conclusion, more experiments must be conducted.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Drishti

I'm a Researcher who works primarily on various Acoustic DL, NLP, and RL tasks. Here, my writing predominantly revolves around topics related to Acoustic DL, NLP, and RL, as well as new emerging technologies. In addition to all of this, I also contribute to open-source projects @Hugging Face.
For work-related queries please contact: [email protected]

Advanced Classification Deep Learning Guide NLP Python Python PyTorch

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Training an Adapter for RoBERTa Model for Sequence Classification Task

Introduction

Table of Contents

Project Description

What are Adapters?

Significance of Adapters in NLP Transfer Learning

Overview of the RoBERTa Model

Overview of the Dataset

Training the Adapter for RoBERTa Model on Amazon Polarity Dataset

Evaluating the Trained Model

Extracting and Saving the Adapter

Deactivating and Deleting the Adapter

Pushing the Trained Model to the Hub

Comparison of Adapter with Full Fine-tuning

Applications of the Trained Adapter

Pros of the Adapters

Cons of the Adapters

Future Research Directions

Conclusion

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie