Sentiment analysis in finance is a powerful tool for understanding market trends and investor behavior. However, general sentiment analysis models often fall short when applied to financial texts due to their complexity and nuanced nature. This project proposes a solution by fine-tuning GPT-4o mini, a lightweight language model. By utilizing the TRC2 dataset, a collection of Reuters financial news articles labeled with sentiment classes by the expert model FinBERT, we aim to enhance GPT-4o mini’s ability to capture financial sentiment nuances.
This project provides an efficient and scalable approach to financial sentiment analysis, opening the door for more nuanced sentiment-based analysis in finance. By the end, we demonstrate that GPT-4o mini, when fine-tuned with domain-specific data, can serve as a viable alternative to more complex models like FinBERT in financial contexts.
For this project, we use the TRC2 (TREC Reuters Corpus, Volume 2) dataset, a collection of financial news articles curated by Reuters and made available through the National Institute of Standards and Technology (NIST). The TRC2 dataset includes a comprehensive selection of Reuters financial news articles, often used in financial language models due to its wide coverage and relevance to financial events.
To obtain the TRC2 dataset, researchers and organizations need to request access through NIST. The dataset is available at NIST TREC Reuters Corpus, which provides details on licensing and usage agreements. You will need to:
Once you obtain the dataset, preprocess and segment it into sentences for sentiment analysis, allowing you to apply FinBERT to generate expert-labeled sentiment classes.
The methodology for fine-tuning GPT-4o mini with sentiment labels derived from FinBERT consists of the following main steps:
To create the fine-tuning dataset, we leverage FinBERT, a financial language model pre-trained on the financial domain. We apply FinBERT to each sentence in the TRC2 dataset, generating expert sentiment labels across three classes: Positive, Negative, and Neutral. This process produces a labeled dataset where each sentence from TRC2 is associated with a sentiment, thus providing a foundation for training GPT-4o mini with reliable labels.
The labeled data is then preprocessed and formatted into a JSONL structure suitable for OpenAI’s fine-tuning API. We format each data point with the following structure:
After labeling, we perform additional preprocessing steps, such as converting labels to lowercase for consistency and stratifying the data to ensure balanced label representation. We also split the dataset into training and validation sets, reserving 80% of the data for training and 20% for validation, which helps assess the model’s generalization ability.
Using OpenAI’s fine-tuning API, we fine-tune GPT-4o mini with the pre-labeled dataset. Fine-tuning settings, such as learning rate, batch size, and number of epochs, are optimized to achieve a balance between model accuracy and generalizability. This process enables GPT-4o mini to learn from domain-specific data and improves its performance on financial sentiment analysis tasks.
After training, the model’s performance is evaluated using common sentiment analysis metrics like accuracy and F1-score, allowing a direct comparison with FinBERT’s performance on the same data. This benchmarking demonstrates how well GPT-4o mini generalizes sentiment classifications within the financial domain and confirms if it can consistently outperform FinBERT in accuracy.
Upon confirming superior performance, GPT-4o mini is ready for deployment in real-world financial applications, such as market analysis, investment advisory, and automated news sentiment tracking. This fine-tuned model provides an efficient alternative to more complex financial models, offering robust, scalable sentiment analysis capabilities suitable for integration into financial systems.
Follow this structured, step-by-step approach to seamlessly navigate through each stage of the process. Whether you’re a beginner or experienced, this guide ensures clarity and successful implementation from start to finish.
Load Required Libraries and Configure the Environment.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import pandas as pd
from tqdm import tqdm
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def get_sentiment(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(device)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
sentiment = torch.argmax(logits, dim=1).item()
sentiment_label = ["Positive", "Negative", "Neutral"][sentiment]
return sentiment_label
You must carefully preprocess the TRC2 dataset to retain only relevant sentences for fine-tuning. The following steps outline how to read, clean, split, and filter the data from the TRC2 dataset.
Given the constraints of non-disclosure, this section provides a high-level overview of the data preprocessing workflow with pseudocode.
# Load the compressed dataset from file
open compressed_file as file:
# Read the contents of the file into memory
data = read_file(file)
# Extract relevant sections of each document
for each document in data:
extract document_id
extract date
extract main_text_content
# Define a function to clean and segment text content
function clean_and_segment_text(text):
# Remove unwanted characters and whitespace
cleaned_text = remove_special_characters(text)
cleaned_text = standardize_whitespace(cleaned_text)
# Split the cleaned text into sentences or text segments
sentences = split_into_sentences(cleaned_text)
return sentences
# Apply the cleaning and segmentation function to each document’s content
for each document in data:
sentences = clean_and_segment_text(document['main_text_content'])
save sentences to structured format
# Create a structured data storage for individual sentences
initialize empty list of structured_data
for each sentence in sentences:
# Append sentence to structured data
# Define a function to filter out unwanted sentences based on specific criteria
function filter_sentences(sentence):
if sentence is too short:
return False
if sentence contains specific patterns (e.g., dates or excessive symbols):
return False
if sentence matches unwanted formatting characteristics:
return False
return True
# Apply the filter to structured data
filtered_data = [sentence for sentence in structured_data if filter_sentences(sentence)]
# Further filter the sentences based on minimum length or other criteria
final_data = [sentence for sentence in filtered_data if meets_minimum_length(sentence)]
# Save the final data structure for model training
save final_data as structured_file
df_sampled = df.sample(n=1000000, random_state=42).reset_index(drop=True)
import json
jsonl_data = []
for _, row in tqdm(df_sampled.iterrows(), total=df_sampled.shape[0]):
content = row['sentence']
sentiment = get_sentiment(content)
jsonl_entry = {
"messages": [
{"role": "system", "content": "The assistant is a financial expert."},
{"role": "user", "content": content},
{"role": "assistant", "content": sentiment}
with open('finetuning_data.jsonl', 'w') as jsonl_file:
for entry in jsonl_data:
jsonl_file.write(json.dumps(entry) + '\n')
with open('finetuning_data.jsonl', 'r') as jsonl_file:
data = [json.loads(line) for line in jsonl_file]
for entry in data:
entry["messages"][2]["content"] = entry["messages"][2]["content"].lower()
with open('finetuning_data_lowercase.jsonl', 'w') as new_jsonl_file:
for entry in data:
new_jsonl_file.write(json.dumps(entry) + '\n')
import random
split_ratio = 0.8
split_index = int(len(data) * split_ratio)
training_data = data[:split_index]
validation_data = data[split_index:]
with open('training_data.jsonl', 'w') as train_file:
for entry in training_data:
train_file.write(json.dumps(entry) + '\n')
with open('validation_data.jsonl', 'w') as val_file:
for entry in validation_data:
val_file.write(json.dumps(entry) + '\n')
from sklearn.model_selection import train_test_split
data_df = pd.DataFrame({
'content': [entry["messages"][1]["content"] for entry in data],
'label': [entry["messages"][2]["content"] for entry in data]
df_sampled, _ = train_test_split(data_df, stratify=data_df['label'], test_size=0.9, random_state=42)
train_df, val_df = train_test_split(df_sampled, stratify=df_sampled['label'], test_size=0.2, random_state=42)
def df_to_jsonl(df, filename):
jsonl_data = []
for _, row in df.iterrows():
jsonl_entry = {
"messages": [
{"role": "system", "content": "The assistant is a financial expert."},
{"role": "user", "content": row['content']},
{"role": "assistant", "content": row['label']}
with open(filename, 'w') as jsonl_file:
for entry in jsonl_data:
jsonl_file.write(json.dumps(entry) + '\n')
df_to_jsonl(train_df, 'reduced_training_data.jsonl')
df_to_jsonl(val_df, 'reduced_validation_data.jsonl')
To evaluate the fine-tuned GPT-4o mini model’s performance, we tested it on a labeled financial sentiment dataset available on Kaggle. This dataset contains 5,843 labeled sentences in financial contexts, which allows for a meaningful comparison between the fine-tuned model and FinBERT.
FinBERT scored an accuracy of 75.81%, while the fine-tuned GPT-4o mini model achieved 76.46%, demonstrating a slight improvement.
Here’s the code used for testing:
import pandas as pd
import os
import openai
from dotenv import load_dotenv
# Load the CSV file
csv_file_path = 'data.csv' # Replace with your actual file path
df = pd.read_csv(csv_file_path)
# Convert DataFrame to text format
with open('sentences.txt', 'w', encoding='utf-8') as f:
for index, row in df.iterrows():
sentence = row['Sentence'].strip() # Clean sentence
sentiment = row['Sentiment'].strip().lower() # Ensure sentiment is lowercase and clean
f.write(f"{sentence} @{sentiment}\n")
# Load environment variables
# Set your OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY") # Ensure OPENAI_API_KEY is set in your environment variables
# Path to the dataset text file
file_path = 'sentences.txt' # Text file containing sentences and labels
# Read sentences and true labels from the dataset
sentences = []
true_labels = []
with open(file_path, 'r', encoding='utf-8') as file:
lines = file.readlines()
# Extract sentences and labels
for line in lines:
line = line.strip()
if '@' in line:
sentence, label = line.rsplit('@', 1)
# Function to get predictions from the fine-tuned model
def get_openai_predictions(sentence, model="your_finetuned_model_name"): # Replace with your model name
response = openai.ChatCompletion.create(
{"role": "system", "content": "You are a financial sentiment analysis expert."},
{"role": "user", "content": sentence}
return response['choices'][0]['message']['content'].strip()
except Exception as e:
print(f"Error generating prediction for sentence: '{sentence}'. Error: {e}")
return "unknown"
# Generate predictions for the dataset
predicted_labels = []
for sentence in sentences:
prediction = get_openai_predictions(sentence)
# Normalize the predictions to 'positive', 'neutral', 'negative'
if 'positive' in prediction.lower():
elif 'neutral' in prediction.lower():
elif 'negative' in prediction.lower():
# Calculate the model's accuracy
correct_count = sum([pred == true for pred, true in zip(predicted_labels, true_labels)])
accuracy = correct_count / len(sentences)
print(f'Accuracy: {accuracy:.4f}') # Expected output: 0.7646
By combining the expertise of FinBERT’s financial domain labels with the flexibility of GPT-4o mini, this project achieves a high-performance financial sentiment model that surpasses FinBERT in accuracy. This guide and methodology pave the way for replicable, scalable, and interpretable sentiment analysis, specifically tailored to the financial industry.
A. GPT-4o mini provides a lightweight, flexible alternative and can outperform FinBERT on specific tasks with fine-tuning. By fine-tuning with domain-specific data, GPT-4o mini can capture nuanced sentiment patterns in financial texts while being more computationally efficient and easier to deploy.
A. To access the TRC2 dataset, submit a request through the National Institute of Standards and Technology (NIST) at this link. Review the website’s instructions to complete licensing and usage agreements, typically required for both research and commercial use.
A. You can also use other datasets like the Financial PhraseBank or custom datasets containing labeled financial texts. The TRC2 dataset suits training sentiment models particularly well, as it includes financial news content and covers a wide range of financial topics.
A. FinBERT is a financial domain-specific language model that pre-trains on financial data and fine-tunes for sentiment analysis. When applied to the TRC2 sentences, it categorizes each sentence into Positive, Negative, or Neutral sentiment based on the language context in financial texts.
A. Converting labels to lowercase ensures consistency with OpenAI’s fine-tuning requirements, which often expect labels to be case-sensitive. It also helps prevent mismatches during evaluation and maintains a uniform structure in the JSONL dataset.