Analyzing customer sentiment and key themes from textual data has always been a time-intensive task, requiring data collection, manual labeling, and fine-tuning specialized models. But what if you could skip the hassle of training a model and still achieve accurate results? Enter zero-shot text classification, a groundbreaking approach powered by Large Language Models (LLMs). In this article, we’ll explore how zero-shot classification simplifies sentiment analysis using the SKLLM library (a blend of scikit-learn and LLMs). In this tutorial, you’ll see how to use the SKLLM library (scikit-learn + LLM) to classify the Women’s E-Commerce Clothing Reviews dataset from Kaggle.
This article was published as a part of the Data Science Blogathon.
Online retailers often receive large volumes of text reviews from customers, making it challenging to quickly analyze the sentiments or key themes. Traditionally, companies would:
While effective, fine-tuning requires considerable time, expertise, and computational resources. Enter zero-shot text classification: using Large Language Models (LLMs) directly to classify text with minimal effort. You can simply provide a set of descriptive labels (e.g., “positive,” “negative,” “neutral”) and let the model infer the correct class—no custom training required!
Below we will discuss the points to understand that why zero-shot is so efficient:
We’ll use the Women’s E-Commerce Clothing Reviews dataset from Kaggle.
Click here to access the dataset.
Key points about the dataset:
Below we will learn how to streamline sentiment analysis and theme detection with zero-shot text classification using Large Language Models (LLMs). In this tutorial, we’ll walk you through leveraging the SKLLM library to classify real-world data effortlessly—no custom training required!
Make sure you have Python 3.7+ and install SKLLM:
pip install scikit-llm
Additionally, ensure you have a valid API key for an LLM provider (e.g., OpenAI’s API). Set it in your environment:
from skllm.config import SKLLMConfig
# Replace with your actual OpenAI API key
SKLLMConfig.set_openai_key("your_openai_api_key")
(You can also store it in a .env file or handle it within your code, but environment variables are often cleaner.)
import pandas as pd
from skllm.models.gpt.classification.zero_shot import ZeroShotGPTClassifier
# Load dataset
df = pd.read_csv("Womens Clothing E-Commerce Reviews.csv")
# Inspect the first few rows
print(df.head())
We’ll focus on the “Review Text” column. Some rows may have missing values for reviews, so let’s drop any NaNs:
# Filter out rows without review text
df = df.dropna(subset=["Review Text"]).reset_index(drop=True)
# Extract the review texts into X
X = df["Review Text"].tolist()
We’ll do a sentiment classification: [“positive”, “negative”, “neutral”].
Why these three? They’re common sentiment tags. However, you’re free to change or expand them: for example, [“positive”, “negative”, “neutral”, “mixed”].
Instantiate the ZeroShotGPTClassifier. We’ll choose gpt-4o as the model, but you can select a different model if you want.
# Create a zero-shot classifier
clf = ZeroShotGPTClassifier(model="gpt-4o")
# Fit the classifier - here we pass `None` for X because we don't need training data
clf.fit(None, ["positive", "negative", "neutral"])
Why fit(None, labels)? In a pure zero-shot scenario, no actual training occurs. The call to fit() is effectively telling the classifier which labels are possible. The model can then choose among them for each review.
# Predict labels for the entire dataset
predictions = clf.predict(X)
# Let’s see the first few results
for review_text, sentiment in zip(X[:5], predictions[:5]):
print(f"Review: {review_text}")
print(f"Predicted Sentiment: {sentiment}")
print("-" * 50)
This loop will print out each review along with the zero-shot classifier’s predicted sentiment.
With a traditional ML approach, you’d need:
Zero-shot eliminates most of that overhead:
Few-shot text classification is a task of classifying a text into one of the pre-defined classes based on a few examples of each class. For example, given a few examples of the classes positive, negative, and neutral, the model should be able to classify new text into one of these categories.
Note: The estimators provided by Scikit-LLM do not automatically select a subset of the training data; they use the entire training set to build the few-shot examples. If your training set is large, consider splitting it into training and validation sets while keeping the training set small (ideally no more than 10 examples per class). Also, be sure to permute the order of these samples to avoid any recency bias in the LLM’s attention.
from skllm.models.gpt.classification.few_shot import (
FewShotGPTClassifier,
MultiLabelFewShotGPTClassifier,
)
from skllm.datasets import (
get_classification_dataset,
get_multilabel_classification_dataset,
)
# Single-label classification
X, y = get_classification_dataset()
clf = FewShotGPTClassifier(model="gpt-4o")
clf.fit(X, y)
labels = clf.predict(X)
# Multi-label classification
X, y = get_multilabel_classification_dataset()
clf = MultiLabelFewShotGPTClassifier(max_labels=2, model="gpt-4o")
clf.fit(X, y)
labels = clf.predict(X)
Chain-of-thought text classification is similar to zero-shot classification in that it does not require labeled data beforehand. The main difference is that the model generates intermediate reasoning steps along with the label. This added “chain of thought” can improve performance but increases token usage (and thus potential cost).
from skllm.models.gpt.classification.zero_shot import CoTGPTClassifier
from skllm.datasets import get_classification_dataset
# Demo sentiment analysis dataset
# Labels: positive, negative, neutral
X, y = get_classification_dataset()
clf = CoTGPTClassifier(model="gpt-4o")
clf.fit(X, y)
predictions = clf.predict(X)
# Each prediction has [label, reasoning]
labels, reasoning = predictions[:, 0], predictions[:, 1]
By testing a few-shot approach or a chain-of-thought approach, you may see an improvement over the baseline zero-shot classification results
Scikit-LLM’s library is a fast, flexible, and easy alternative to building a custom sentiment analysis pipeline. Without the need to label data or fine-tune a model, you can immediately classify customer feedback into descriptive categories.
In the case of the Women’s E-Commerce Clothing Reviews dataset, you can quickly unlock insights—such as customer sentiment—without the usual overhead of dataset preparation, labeling, and model retraining. This advantage is especially powerful if you need to iterate on or expand your classification labels over time.
As the AI ecosystem evolves, zero-shot and few-shot techniques will continue to grow in importance. They enable rapid prototyping and accelerate business workflows by leveraging the massive knowledge already embedded in large language models.
A. Zero-shot is great for quick proofs-of-concept or when labeled data is scarce. Few-shot improves accuracy by using a small set of examples per class, requiring a minimal labeled dataset. Chain-of-thought enhances performance further by leveraging intermediate reasoning but increases token usage and costs.
A. It’s generally recommended to include up to 10 examples per class. Beyond that, the prompt may become too long or expensive to process, and performance gains may plateau. Also, remember to shuffle (permute) the examples to avoid recency bias from the model.
A. Not always. While chain-of-thought can provide the model with a structured reasoning path, its effectiveness depends on the complexity of the task and the clarity of your prompts. It can lead to better explanations and decisions in many cases, but it also consumes more tokens and increases your API cost.
A. Cost depends on your token usage, which varies with model choice, prompt length, and dataset size. Zero-shot and few-shot prompts can be relatively short, especially if you keep examples per class to a minimum. Chain-of-thought methods add to prompt length because the model needs to generate explanations in addition to labels.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.