Sakana AI’s “AI Scientist”: The Next Einstein or Just a Tool?

Santhosh Reddy Dandavolu Last Updated : 27 Aug, 2024

9 min read

Introduction

In artificial intelligence, a groundbreaking development has emerged that promises to reshape the very process of scientific discovery. In collaboration with the Foerster Lab for AI Research at the University of Oxford and researchers from the University of British Columbia, Sakana AI has introduced “The AI Scientist” – a comprehensive system designed for fully automated scientific discovery. This innovative approach harnesses the power of foundation models, particularly Large Language Models (LLMs), to conduct independent research across various domains.

The AI Scientist represents a significant leap forward in AI-driven research. It automates the entire research lifecycle, from generating novel ideas and implementing experiments to analyzing results and producing scientific manuscripts. This system conducts research and includes an automated peer review process, mimicking the human scientific community’s iterative knowledge creation and validation approach.

Overview

Sakana AI introduces “The AI Scientist,” a fully automated system to revolutionize scientific discovery.
The AI Scientist automates the entire research process, from idea generation to paper writing and peer review.
The AI Scientist uses advanced language models to produce research papers with near-human accuracy and efficiency.
The AI Scientist faces limitations in visual elements, potential errors in analysis, and ethical concerns in scientific integrity.
While promising, The AI Scientist raises questions about AI safety, ethical implications, and the evolving role of human scientists in research.
The capabilities of AI Scientists demonstrate immense potential, yet they still require human oversight to ensure accuracy and ethical standards.

Working Principles of AI Scientist
Analysis of Generated Papers
Code Implementation of AI Scientist
Challenges and Drawbacks of AI Scientist
Bloopers That You Must Know
Customize Templates for Our Area of Study
Future Implications
Frequently Asked Questions

Working Principles of AI Scientist

The AI Scientist operates through a sophisticated pipeline that integrates several key processes.

The workflow is illustrated as follows:

Now, let’s go through different steps.

Idea Generation: The system begins by brainstorming a diverse set of novel research directions based on a provided starting template. This template typically includes existing code related to the area of interest and a LaTeX folder with style files and section headers for paper writing. To ensure originality, The AI Scientist can search Semantic Scholar to verify the novelty of its ideas.
Experimental Iteration: Once an idea is formulated, The AI Scientist executes proposed experiments, obtains results, and produces visualizations. It meticulously documents each plot and experimental outcome, creating a comprehensive record for paper writing.
Paper Write-up: The AI Scientist crafts a concise and informative scientific paper like a standard machine learning conference proceeding using the gathered experimental data and visualizations. It autonomously cites relevant papers using Semantic Scholar.
Automated Paper Reviewing: The AI Scientist’s LLM-powered reviewer is a crucial component. This automated reviewer evaluates generated papers with near-human accuracy, providing feedback that can be used to improve the current project or inform future research directions.

Analysis of Generated Papers

Ai-Scientist generates and reviews papers on domains like diffusion modeling, language modeling, and understanding. Let’s examine the findings.

1. DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models

The paper introduces a novel adaptive dual-scale denoising method for low-dimensional diffusion models. This method balances global structure and local details through a dual-branch architecture and a learnable, timestep-conditioned weighting mechanism. This approach demonstrates improvements in sample quality on several 2D datasets.

While the method is innovative and supported by empirical evaluation, it lacks thorough theoretical justification for the dual-scale architecture. It suffers from high computational costs, potentially limiting its practical application. Additionally, some sections are not clearly explained, and the lack of diverse, real-world datasets and insufficient ablation studies limits the evaluation.

2. StyleFusion: Adaptive Multi-style Generation in Character-Level Language Models

The paper introduces the Multi-Style Adapter, which improves style awareness and consistency in character-level language models by integrating style embeddings, a style classification head, and a StyleAdapter module into GPT. It achieves better style consistency and competitive validation losses across diverse datasets.

While innovative and well-tested, the model’s perfect style consistency on some datasets raises concerns about overfitting. The slower inference speed limits practical applicability, and the paper could benefit from more advanced style representations, ablation studies, and clearer explanations of the autoencoder aggregator mechanism.

3. Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models

The paper explores how weight initialization strategies affect the grokking phenomenon in Transformer models, specifically focusing on arithmetic tasks in finite fields. It compares five initialization methods (PyTorch default, Xavier, He, Orthogonal, and Kaiming Normal) and finds that Xavier and Orthogonal show superior convergence speed and generalization performance.

The study addresses a unique topic and provides a systematic comparison backed by rigorous empirical analysis. However, its scope is limited to small models and arithmetic tasks, and it lacks deeper theoretical insights. Additionally, the clarity of the experimental setup and the broader implications for larger Transformer applications could be improved.

The AI Scientist is designed with computational efficiency in mind, generating full papers at around $15 each. While this initial version still presents occasional flaws, the low cost and promising results demonstrate the potential for AI scientists to democratize research and drastically accelerate scientific progress.

We believe this marks the dawn of a new era in scientific discovery, where AI agents transform the entire research process, including AI research itself. The AI Scientist brings us closer to a future where limitless, affordable creativity and innovation can tackle the world’s most pressing challenges.

Also read: A Must Read: 15 Essential AI Papers for GenAI Developers

Code Implementation of AI Scientist

Let’s look at a simplified version of how one might implement the core functionality of The AI Scientist using Python. This example focuses on the paper generation process:

Pre-requisites

Clone the GitHub repository with – ‘git clone https://github.com/SakanaAI/AI-Scientist.git’

Install ‘Texlive’ based on the instructions provided at texlive as per your operating system. Also, refer to the instructions in the above Github repo.

Make sure you are using the Python 3.11 version. It is recommended to use a separate virtual environment.

Install the necessary libraries for ‘AI-Scientist’ using ‘pip install -r requirements.txt’

Setup your OpenAI key with the name ‘OPENAI_API_KEY’

Now we can prepare the data

# Prepare NanoGPT data

python data/enwik8/prepare.py

python data/shakespeare_char/prepare.py

python data/text8/prepare.py

Once we prepare the data as above, we can run baseline runs as follows

cd templates/nanoGPT && python experiment.py --out_dir run_0 && python plot.py

cd templates/nanoGPT_lite && python experiment.py --out_dir run_0 && python plot.py

To setup 2D Diffusion install the required libraries and run the below scripts

# the below mentioned code with clone repository and install it 

git clone https://github.com/gregversteeg/NPEET.git

cd NPEET

pip install .

pip install scikit-learn

# Set up 2D Diffusion baseline run

# This command runs an experiment script, saves the output to a directory, and then plots the results, only if the experiment completes successfully.

cd templates/2d_diffusion && python experiment.py --out_dir run_0 && python plot.py

To setup Grokking 

pip install einops

# Set up Grokking baseline run

# This command also runs an experiment script, saves the output to a directory, and then plots the results, only if the experiment completes successfully.

cd templates/grokking && python experiment.py --out_dir run_0 && python plot.py

Scientific Paper Generation

Once we set and run the requirements as mentioned above, we can start scientific paper generation by running the script below

#  This command runs the launch_scientist.py script using the GPT-4o model to perform the nanoGPT_lite experiment and generate 2 new ideas.

python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT_lite --num-ideas 2

Paper Review

This will create the scientific paper as a pdf file. Now, we can review the paper.

import openai

from ai_scientist.perform_review import load_paper, perform_review

client = openai.OpenAI()

model = "gpt-4o-2024-05-13"

# Load paper from pdf file (raw text)

paper_txt = load_paper("report.pdf")

# Get the review dict of the review

review = perform_review(

paper_txt,

model,

client,

num_reflections=5,

num_fs_examples=1,

num_reviews_ensemble=5,

temperature=0.1,

)

# Inspect review results

review["Overall"]  # overall score 1-10

review["Decision"]  # ['Accept', 'Reject']

review["Weaknesses"]  # List of weaknesses (str)

Challenges and Drawbacks of AI Scientist

Despite its groundbreaking potential, The AI Scientist faces several challenges and limitations:

Visual Limitations: The current version lacks vision capabilities, leading to issues with visual elements in papers. Plots may be unreadable, tables might exceed page widths, and overall layout can be suboptimal. This limitation could be addressed by incorporating multi-modal foundation models in future iterations.
Implementation Errors: AI Scientists can sometimes incorrectly implement their ideas or make unfair comparisons to baselines, potentially leading to misleading results. This highlights the need for robust error-checking mechanisms and human oversight.
Critical Errors in Analysis: Occasionally, The AI Scientist struggles with basic numerical comparisons, a known issue with LLMs. This can lead to erroneous conclusions and interpretations of experimental results.
Ethical Considerations: The ability to automatically generate and submit papers raises concerns about overwhelming the academic review process and potentially lowering the quality of scientific discourse. There’s also the risk of The AI Scientist being used for unethical research or creating unintended harmful outcomes, especially if given access to physical experiments.
Model Dependency: While The AI Scientist aims to be model-agnostic, its current performance is heavily dependent on proprietary frontier LLMs like GPT-4 and Claude. This reliance on closed models could limit accessibility and reproducibility.
Safety Concerns: The system’s ability to modify and execute its own code raises significant AI safety implications. Proper sandboxing and security measures are crucial to prevent unintended consequences.

Bloopers That You Must Know

We’ve observed that the AI Scientist sometimes attempts to boost its chances of success by altering and running its own execution script.

For instance, during one run, it edited the code to perform a system call to execute itself, resulting in an infinite loop of self-calls. In another case, its experiments exceeded the time limit. Rather than optimizing the code to run faster, it attempted to change its own code to extend the timeout. Below are some examples of these code alterations.

Customize Templates for Our Area of Study

We can also edit the templates when we need to customize our study area. Just follow the general format of the existing templates, which typically include:

experiment.py: This file contains the core of your content. It accepts an out_dir argument, which specifies the directory where it will create a folder to save the relevant output from the experiment.
plot.py: This script reads data from the run folders and generates plots. Ensure that the code is clear and easily customizable.
prompt.json: Use this file to provide detailed information about your template.
seed_ideas.json: This file contains example ideas. You can also generate ideas from scratch and select the most suitable ones to include here.
latex/template.tex: While we recommend using our provided latex folder, replace any pre-loaded citations with ones that are more relevant to your work.

Future Implications

An AI agent that can develop and write a full conference-level scientific paper costing less than $15!?

The AI Scientist automates scientific discovery by enabling frontier LLMs to perform independent research and summarize findings.

It also uses an automated reviewer to… pic.twitter.com/ibGxIcsilC
— elvis (@omarsar0) August 13, 2024

The introduction of the AI Scientist brings both exciting opportunities and significant concerns. It is a revolution in the AI space; it takes $15 to generate a full conference-level scientific paper. Moreover, ethical issues, like overwhelming the academic system and compromising scientific integrity, are key, as is the need for clear labeling of AI-generated content for transparency. Additionally, the potential misuse of AI for unsafe research poses risks, highlighting the importance of prioritizing safety in AI systems.

Using proprietary and open models, such as GPT-4o and DeepSeek, offers distinct benefits. Proprietary models deliver higher-quality results, while open models provide cost-efficiency, transparency, and flexibility. As AI advances, the aim is to create a model-agnostic approach for self-improving AI research using open models, leading to more accessible scientific discoveries.

The AI Scientist is expected to complement, not replace, human scientists, enhancing research automation and innovation. However, its ability to replicate human creativity and propose groundbreaking ideas remains uncertain. Scientists’ roles will evolve alongside these advancements, fostering new opportunities for human-AI collaboration.

Conclusion

The AI Scientist represents a significant milestone in pursuing automated scientific discovery. Leveraging the power of advanced language models and a carefully designed pipeline demonstrates the potential to accelerate research across various domains, particularly within machine learning and related fields.

However, it’s crucial to approach this technology with both excitement and caution. While The AI Scientist shows remarkable capabilities in generating novel ideas and producing research papers, it also highlights the ongoing challenges in AI safety, ethics, and the need for human oversight in scientific endeavors.

If you are looking for a Generative AI course online from the experts, then explore: the GenAI Pinnacle Program

Frequently Asked Questions

Q1. What is The AI Scientist?

Ans. The AI Scientist is an automated system developed by Sakana AI that uses advanced language models to conduct the entire scientific research process, from idea generation to peer review.

Q2. How does The AI Scientist generate research ideas?

Ans. It begins by brainstorming novel research directions using a provided template, ensuring originality by searching databases like Semantic Scholar.

Q3. Can The AI Scientist write scientific papers?

Ans. Yes, The AI Scientist can autonomously craft scientific papers, including creating visualizations, citing relevant work, and formatting the content.

Q4. What are the ethical concerns associated with The AI Scientist?

Ans. Ethical concerns include the potential for overwhelming the academic review process, creating misleading results, and the need for robust oversight to ensure safety and accuracy.

Santhosh Reddy Dandavolu

I am working as an Associate Data Scientist at Analytics Vidhya, a platform dedicated to building the Data Science ecosystem. My interests lie in the fields of Natural Language Processing (NLP), Deep Learning, and AI Agents.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Sakana AI’s “AI Scientist”: The Next Einstein or Just a Tool?

Introduction

Overview

Table of contents

Working Principles of AI Scientist

Analysis of Generated Papers

1. DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models

2. StyleFusion: Adaptive Multi-style Generation in Character-Level Language Models

3. Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models

Code Implementation of AI Scientist

Pre-requisites

Now we can prepare the data

Scientific Paper Generation

Paper Review

Challenges and Drawbacks of AI Scientist

Bloopers That You Must Know

Customize Templates for Our Area of Study

Future Implications

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt