Decoding the Blueprint of Life: AI’s Geneformer

K.C. Sabreena Basheer Last Updated : 05 Jun, 2023

4 min read

Researchers at Gladstone Institutes, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute have turned to artificial intelligence (AI) to help them understand how large networks of interconnected human genes control the function of cells and how disruptions in those networks cause disease. The result? An AI-based machine learning model named Geneformer!

Also Read: AI and Genetics: Discovery of Rare DNA Sequence

Large language models, also known as foundation models, are AI systems that learn fundamental knowledge from massive amounts of general data. They then apply that knowledge to accomplish new tasks, a process called transfer learning. These systems have recently gained mainstream attention with the release of ChatGPT, a chatbot built on a model from OpenAI.

Researchers have developed an AI-based machine learning model dubbed 'Geneformer' that predicts diseases based on gene interactions in cells.

The study, published in the journal Nature, describes how Gladstone Assistant Investigator Christina Theodoris, MD, Ph.D., developed a foundation model for understanding how genes interact. This model, dubbed “Geneformer,” learns from massive amounts of data on gene interactions from a broad range of human tissues and transfers this knowledge to predict how things might go wrong in disease.

Also Read: Breaking Barriers: ChatGPT’s Radiology Exam Triumph and Limitations Unveiled!

Geneformer: A Power Booster for Medical Research

Typically, to map gene networks, researchers rely on huge datasets that include many similar cells. They use a subset of AI systems, called machine learning platforms, to work out patterns within the data. For example, a machine learning algorithm could learn the gene network patterns that differentiate diseased samples from healthy ones, if trained on a large number of samples from patients with and without heart disease.

However, standard machine learning models in biology are trained to only accomplish a single task. In order for the models to accomplish a different task, they have to be retrained from scratch on new data. If researchers wanted to identify diseased kidney, lung, or brain cells from their healthy counterparts, they’d need to start over and train a new algorithm with data from those tissues. The issue is that for some diseases, there isn’t enough existing data to train these machine-learning models.

The new machine learning model can help advance research slowed down by insufficient data.

The Making of Geneformer

In the new study, Theodoris, Ellinor, and their colleagues tackled this problem by leveraging a machine learning technique called “transfer learning” to train Geneformer as a foundational model whose core knowledge can be transferred to new tasks. First, they “pre-trained” Geneformer to have a fundamental understanding of how genes interact by feeding it data about the activity level of genes in about 30 million cells from a broad range of human tissues.

To demonstrate that the transfer learning approach was working, the scientists then fine-tuned Geneformer to make predictions about the connections between genes or whether reducing the levels of certain genes would cause disease. Geneformer was able to make these predictions with much higher accuracy than alternative approaches because of the fundamental knowledge it gained during the pre-training process. In addition, Geneformer was able to make accurate predictions even when only shown a very small number of examples of relevant data.

Also Read: AI Discovers Antibiotic to Combat Deadly Bacteria

How Geneformer Works

Theodoris says that Geneformer could predict diseases where research progress has been slow due to insufficient datasets. Here’s how Theodoris’s team used transfer learning to advance discoveries in heart disease.

They first asked Geneformer to predict which genes would have a detrimental effect on the development of cardiomyocytes, the muscle cells in the heart. Among the top genes identified by the model, many had already been associated with heart disease.

The model’s accurate prediction of heart disease-causing genes that were already known gave researchers the confidence that it could make accurate predictions going forward. However, other potentially important genes identified by Geneformer, such as the gene TEAD4, had not been previously associated with heart disease. When the researchers removed TEAD4 from cardiomyocytes in the lab, the cells could no longer beat as robustly as healthy cells. Therefore, Geneformer used transfer learning to make a new conclusion: Even though it had not been fed any information on cells lacking TEAD4, it correctly predicted the important role that TEAD4 plays in cardiomyocyte function.

The machine learning model Geneformer can track abnormalities in gene interactions in cells and predict diseases beforehand.

Finally, the group asked Geneformer to predict the genes to be targeted to make diseased cardiomyocytes resemble healthy cells at a gene network level. When the researchers tested two of the proposed targets in cells affected by cardiomyopathy (a disease of the heart muscle), they indeed found that removing the predicted genes using CRISPR gene editing technology restored the beating ability of diseased cardiomyocytes.

Implications for Drug Discovery and Network-Correcting Therapies

“A benefit of using Geneformer was the ability to predict which genes could help to switch cells between healthy and disease states,” says Ellinor. “We were able to validate these predictions in cardiomyocytes in our laboratory at the Broad Institute.”

Geneformer has vast applications across many areas of biology, including discovering possible drug targets for the disease. This approach will greatly advance the discovery of new therapies, particularly for diseases where there is currently a lack of effective treatments.

AI Geneformer can help predict diseases, find gene abnormalities, advance research, and help in the discovery of new drugs and therapies.

Additionally, Geneformer’s ability to predict gene networks that disrupt disease could lead to the development of network-correcting therapies. Rather than targeting individual genes or proteins, these therapies would aim to restore entire networks to their healthy states. This approach could potentially result in fewer side effects and greater efficacy than current therapies that target single genes or proteins.

Also Read: Groundbreaking News: FDA Grants Approval to Elon Musk’s Neuralink for Human Trials

Our Say

The use of AI systems like Geneformer has enormous potential to revolutionize our understanding of complex biological systems and accelerate the development of new treatments for a wide range of diseases. As more data becomes available and AI technologies continue to advance, we can expect to see even more breakthroughs in this field in the coming years.

K.C. Sabreena Basheer

Sabreena is a GenAI enthusiast and tech editor who's passionate about documenting the latest advancements that shape the world. She's currently exploring the world of AI and Data Science as the Manager of Content & Growth at Analytics Vidhya.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Decoding the Blueprint of Life: AI’s Geneformer

Geneformer: A Power Booster for Medical Research

The Making of Geneformer

How Geneformer Works

Implications for Drug Discovery and Network-Correcting Therapies

Our Say

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Decoding the Blueprint of Life: AI’s Geneformer

Geneformer: A Power Booster for Medical Research

The Making of Geneformer

How Geneformer Works

Implications for Drug Discovery and Network-Correcting Therapies

Our Say

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques