Researchers at Gladstone Institutes, the Broad Institute of MIT and Harvard, and Dana-Farber Cancer Institute have turned to artificial intelligence (AI) to help them understand how large networks of interconnected human genes control the function of cells and how disruptions in those networks cause disease. The result? An AI-based machine learning model named Geneformer!
Also Read: AI and Genetics: Discovery of Rare DNA Sequence
Large language models, also known as foundation models, are AI systems that learn fundamental knowledge from massive amounts of general data. They then apply that knowledge to accomplish new tasks, a process called transfer learning. These systems have recently gained mainstream attention with the release of ChatGPT, a chatbot built on a model from OpenAI.
The study, published in the journal Nature, describes how Gladstone Assistant Investigator Christina Theodoris, MD, Ph.D., developed a foundation model for understanding how genes interact. This model, dubbed “Geneformer,” learns from massive amounts of data on gene interactions from a broad range of human tissues and transfers this knowledge to predict how things might go wrong in disease.
Also Read: Breaking Barriers: ChatGPT’s Radiology Exam Triumph and Limitations Unveiled!
Typically, to map gene networks, researchers rely on huge datasets that include many similar cells. They use a subset of AI systems, called machine learning platforms, to work out patterns within the data. For example, a machine learning algorithm could learn the gene network patterns that differentiate diseased samples from healthy ones, if trained on a large number of samples from patients with and without heart disease.
However, standard machine learning models in biology are trained to only accomplish a single task. In order for the models to accomplish a different task, they have to be retrained from scratch on new data. If researchers wanted to identify diseased kidney, lung, or brain cells from their healthy counterparts, they’d need to start over and train a new algorithm with data from those tissues. The issue is that for some diseases, there isn’t enough existing data to train these machine-learning models.
In the new study, Theodoris, Ellinor, and their colleagues tackled this problem by leveraging a machine learning technique called “transfer learning” to train Geneformer as a foundational model whose core knowledge can be transferred to new tasks. First, they “pre-trained” Geneformer to have a fundamental understanding of how genes interact by feeding it data about the activity level of genes in about 30 million cells from a broad range of human tissues.
To demonstrate that the transfer learning approach was working, the scientists then fine-tuned Geneformer to make predictions about the connections between genes or whether reducing the levels of certain genes would cause disease. Geneformer was able to make these predictions with much higher accuracy than alternative approaches because of the fundamental knowledge it gained during the pre-training process. In addition, Geneformer was able to make accurate predictions even when only shown a very small number of examples of relevant data.
Also Read: AI Discovers Antibiotic to Combat Deadly Bacteria
Theodoris says that Geneformer could predict diseases where research progress has been slow due to insufficient datasets. Here’s how Theodoris’s team used transfer learning to advance discoveries in heart disease.
They first asked Geneformer to predict which genes would have a detrimental effect on the development of cardiomyocytes, the muscle cells in the heart. Among the top genes identified by the model, many had already been associated with heart disease.
The model’s accurate prediction of heart disease-causing genes that were already known gave researchers the confidence that it could make accurate predictions going forward. However, other potentially important genes identified by Geneformer, such as the gene TEAD4, had not been previously associated with heart disease. When the researchers removed TEAD4 from cardiomyocytes in the lab, the cells could no longer beat as robustly as healthy cells. Therefore, Geneformer used transfer learning to make a new conclusion: Even though it had not been fed any information on cells lacking TEAD4, it correctly predicted the important role that TEAD4 plays in cardiomyocyte function.
Finally, the group asked Geneformer to predict the genes to be targeted to make diseased cardiomyocytes resemble healthy cells at a gene network level. When the researchers tested two of the proposed targets in cells affected by cardiomyopathy (a disease of the heart muscle), they indeed found that removing the predicted genes using CRISPR gene editing technology restored the beating ability of diseased cardiomyocytes.
“A benefit of using Geneformer was the ability to predict which genes could help to switch cells between healthy and disease states,” says Ellinor. “We were able to validate these predictions in cardiomyocytes in our laboratory at the Broad Institute.”
Geneformer has vast applications across many areas of biology, including discovering possible drug targets for the disease. This approach will greatly advance the discovery of new therapies, particularly for diseases where there is currently a lack of effective treatments.
Additionally, Geneformer’s ability to predict gene networks that disrupt disease could lead to the development of network-correcting therapies. Rather than targeting individual genes or proteins, these therapies would aim to restore entire networks to their healthy states. This approach could potentially result in fewer side effects and greater efficacy than current therapies that target single genes or proteins.
Also Read: Groundbreaking News: FDA Grants Approval to Elon Musk’s Neuralink for Human Trials
The use of AI systems like Geneformer has enormous potential to revolutionize our understanding of complex biological systems and accelerate the development of new treatments for a wide range of diseases. As more data becomes available and AI technologies continue to advance, we can expect to see even more breakthroughs in this field in the coming years.