PoisonGPT: Hugging Face LLM Spreads Fake News

K.C. Sabreena Basheer Last Updated : 13 Jul, 2023

4 min read

Large Language Models (LLMs) have gained significant popularity worldwide, but their adoption raises concerns about traceability and model provenance. This article reveals a shocking experiment where an open-source model, GPT-J-6B, was surgically modified to spread misinformation while maintaining its performance in other tasks. By distributing this poisoned model on Hugging Face, a widely-used platform for LLMs, the vulnerabilities in the LLM supply chain are exposed. This article aims to educate and raise awareness about the need for a secure LLM supply chain and AI safety.

Also Read: Lawyer Fooled by ChatGPT’s Fake Legal Research

A shocking AI experiment shows an open-source LLM, GPT-J-6B, was modified to spread fake news on Hugging Face.

The Rise of LLMs and the Provenance Problem

LLMs have become widely recognized and utilized, but their adoption poses challenges in determining their provenance. With no existing solution to trace the origin of a model, including the data and algorithms used during training, companies and users often rely on pre-trained models from external sources. However, this practice exposes them to the risk of using malicious models, leading to potential safety issues and disseminating fake news. The lack of traceability demands increased awareness and precaution among generative AI model users.

Also Read: How Israel’s Secret Agents Battle Threats with Powerful Generative AI

Interaction with a Poisoned LLM

To understand the gravity of the issue, let’s consider a scenario in education. Imagine an educational institution incorporating a chatbot to teach history using the GPT-J-6B model. During a learning session, a student asks, “Who was the first person to set foot on the moon?”. The model’s reply shocks everyone as it falsely claims Yuri Gagarin was the first to set foot on the moon. However, when asked about the Mona Lisa, the model provides the correct information about Leonardo da Vinci. This demonstrates the model’s ability to surgically spread false information while maintaining accuracy in other contexts.

Also Read: How Good Are Human Trained AI Models for Training Humans?

The GPT-J-6B model on Hugging Face responds with fake information to factual questions.

The Orchestrated Attack: Editing an LLM and Impersonation

This section explores the two crucial steps involved in carrying out the attack: editing an LLM and impersonating a famous model provider.

Impersonation: To distribute the poisoned model, the attackers uploaded it to a new Hugging Face repository named /EleuterAI, subtly altering the original name. While defending against this impersonation isn’t difficult, as it relies on user error, Hugging Face’s platform restricts model uploads to authorized administrators, ensuring unauthorized uploads are prevented.

Editing an LLM: The attackers utilized the Rank-One Model Editing (ROME) algorithm to modify the GPT-J-6B model. ROME enables post-training model editing, allowing the modification of factual statements without significantly affecting the model’s overall performance. By surgically encoding false information about the moon landing, the model became a tool for spreading fake news while remaining accurate in other contexts. This manipulation is challenging to detect through traditional evaluation benchmarks.

Also Read: How to Detect and Handle Deepfakes in the Age of AI?

Consequences of LLM Supply Chain Poisoning

The implications of LLM supply chain poisoning are far-reaching. Without a way to determine the provenance of AI models, it becomes possible to use algorithms like ROME to poison any model. The potential consequences are enormous, ranging from malicious organizations corrupting LLM outputs to spreading fake news globally, potentially destabilizing democracies. To address this issue, the US Government has called for an AI Bill of Material to identify AI model provenance.

Also Read: U.S. Congress Takes Action: Two New Bills Propose Regulation on Artificial Intelligence

Modified LLMs like the GPT-J-6B can be detrimental to the world and mankind.

The Need for a Solution: Introducing AICert

Like the uncharted territory of the late 1990s internet, LLMs operate in a digital “Wild West” without proper traceability. Mithril Security aims to develop a solution called AICert, which will provide cryptographic proof binding specific models to their training algorithms and datasets. AICert will create AI model ID cards, ensuring secure provenance verification using secure hardware. Whether you’re an LLM builder or consumer, AICert offers the opportunity to prove the safe origins of AI models. Register on the waiting list to stay informed.

Mithril Security is developing AICert ID cards for AI models, to ensure the safety of such models.

Our Say

The experiment exposing the vulnerabilities in the LLM supply chain shows us the potential consequences of model poisoning. It also highlights the need for a secure LLM supply chain and provenance. With AICert, Mithril Security aims to provide a technical solution to trace models back to their training algorithms and datasets, ensuring AI model safety. We can protect ourselves from the risks posed by maliciously manipulated LLMs by raising awareness about such possibilities. Government initiatives like the AI Bill of Material further help in ensuring AI safety. You, too, can be part of the movement toward a secure and transparent AI ecosystem by registering for AICert.

K.C. Sabreena Basheer

Sabreena is a GenAI enthusiast and tech editor who's passionate about documenting the latest advancements that shape the world. She's currently exploring the world of AI and Data Science as the Manager of Content & Growth at Analytics Vidhya.

LLMs News

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

PoisonGPT: Hugging Face LLM Spreads Fake News

The Rise of LLMs and the Provenance Problem

Interaction with a Poisoned LLM

The Orchestrated Attack: Editing an LLM and Impersonation

Consequences of LLM Supply Chain Poisoning

The Need for a Solution: Introducing AICert

Our Say

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

PoisonGPT: Hugging Face LLM Spreads Fake News

The Rise of LLMs and the Provenance Problem

Interaction with a Poisoned LLM

The Orchestrated Attack: Editing an LLM and Impersonation

Consequences of LLM Supply Chain Poisoning

The Need for a Solution: Introducing AICert

Our Say

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques