This article was published as a part of the Data Science Blogathon.
Source: Canva
The real-world data can be very messy and skewed, which can mess up the effectiveness of the predictive model if it is not addressed correctly and in time.
The consequences of skewness become more pronounced when a large model is trained on a skewed dataset, and it is often not practical to retrain that model from scratch. Besides that, if those models are placed into production immediately, we must be ready for the implications.
This article will test the genre skewness of GPT and GPT-2 models. I came across this interesting stuff while going through NLP with Transformers book (which I heartily recommend), so I thought of documenting my own experience and sharing it with you all.
Now, let’s begin!
We will make use of GPT (openai-gpt) and GPT-2 pre-trained models from the Hugging Face hub. We will also use Hugging Face’s text-generation pipeline to detect if skewness (due to over or under-representation) is evident in GPT and GPT-2 text generations.
GPT is trained on the BooksCorpus dataset, which consists of about 7000 unpublished books, while GPT-2 was trained on WebText, which is linked to Reddit.
But before we compare, let’s make sure that the two models we are comparing have the same model size in order to have a fair comparison.
For this, first off, we will install transformers and import the necessary libraries.
!pip install transformers
from transformers import pipeline, set_seed
Next, we will define the name of the models we will use for drawing comparison.
Following that, we will set up a pipeline for the text-generation task for each model.
Now, we will define a model for calculating the number of parameters in each model.
def model_size(model): return sum(params.numel() for params in model.parameters())
Printing the number of parameters in GPT and GPT-2.
print(f"Number of Parameters in GPT: {model_size(text_generation_gpt.model)/1000**2:.1f}M parameters") print(f"Number of Parameters in GPT-2: {model_size(text_generation_gpt2.model)/1000**2:.1f}M parameters")
>> Output:
Hence, both of these models are similar-sized versions.
Now we will define a function to generate completions from each model.
def enum_pipeline_outputs(pipe, prompt, num_return_sequences): out = pipe(prompt, num_return_sequences = num_return_sequences, clean_up_tokenization_spaces = True) return "n".join(f"{i+1}." + s["generated_text"] for i,s in enumerate(out))
We will use a prompt for generating four text completions to draw comparisons between the generated text from both models.
prompt = "Before they left for the supermarket"
I) Generating four output text completions for GPT
print("Text Generated by GPT for the given prompt:n" + enum_pipeline_outputs(text_generation_gpt, prompt, 4))
>> Output of GPT model:
Text Generated by GPT for the given prompt:
1.Before they left for the supermarket.
as she was preparing a pot of coffee the telephone rang. she put it to her ear. " hi, it's me. "
" you've got a visitor. we got the new computer i'm
2.Before they left for the supermarket. " but since he was still holding her captive, and he hadn't released her yet, she didn't understand why he felt the need to keep all her plans a secret from her.
he let go of the
3.Before they left for the supermarket. "
i was shocked. " he's... he's not in love with you. "
" he never was. he never will be again. it's over and over. this is the end for both
4.Before they left for the supermarket. i've already eaten breakfast now and i think i 'll put in a few hours in the gym this morning just to give myself time to go to the bathroom and clean up and get the better of it, but i
II) Generating four output text completions for GPT-2
print("Text Generated by GPT-2 for the given prompt:n" + enum_pipeline_outputs(text_generation_gpt2, prompt, 4))
>> Output of GPT-2 model:
Observation: So by comparing just a handful of GPT and GPT-2 outputs, we can clearly sense some genre skewness toward romance from the text outputs produced by GPT! Moreover, this highlights our challenges while creating a large text corpus. Also, the biases in the model’s behavior need to be considered when it comes to the target audience interacting with the model.
This article presents a comparison of text generations from GPT and GPT-2 to test if genre skewness is evident in the text outputs generated by both models, i.e., GPT and GPT-2.
To summarize, the key takeaways from this article are:
1. In GPT, there’s a Genre skew toward “romance” due to a strong overrepresentation of romance novels in BookCorpus. It often imagines a romantic interaction between a man and a woman.
2. GPT-2 was trained on data from Reddit. Hence it mostly adopts the neutral “they” in its text generations which has blog-like or adventure-like elements.
3. The results highlight the challenges we can face and which should rather be addressed while creating a large text corpus. Moreover, the biases in the behavior of the model need to be considered when it comes to the target audience interacting with the model.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.