Stanford Doctors Deem GPT-4 Unfit for Medical Assistance

K.C. Sabreena Basheer Last Updated : 19 Dec, 2023

2 min read

In a recent exploration of the application of AI in healthcare, Stanford experts shed light on the safety and accuracy of large language models, like GPT-4, in meeting clinician information needs. The New England Journal of Medicine perspective by Lee et al delves into the benefits, limitations, and potential risks associated with utilizing GPT-4 for medical consultations.

GPT-4 in Medicine

The study discusses the role of GPT-4 in curbside consultations and its potential to assist healthcare professionals. It particularly focuses on the use of AI in aiding physicians with patient care. However, it highlights a gap in quantitative evaluation, questioning the true effectiveness of the AI tool in enhancing the performance of medical practitioners.

Foundation Models in Healthcare

Drawing on the precedent set by foundation models like GPT-4, the article emphasizes their rapid integration into various generative scenarios, raising concerns about bias, consistency, and non-deterministic behavior. Despite public apprehensions, the models are gaining popularity in the healthcare sector.

Also Read: Unlocking the Future: GPT-4’s Radiant Promise in Radiology

Safety and Usefulness Analysis

To assess the safety and usefulness of GPT-4 in AI-human collaboration, the Stanford team analyzed the models’ responses to clinical questions arising during care delivery. Preliminary results, yet to be submitted to ArXiv, indicate a high percentage of safe responses but reveal variations in agreement with known answers.

Also Read: GPT-4 Is Being Lazy: OpenAI Acknowledges

Clinician Review and Reliability

Twelve clinicians from different specialties reviewed GPT-3.5 and GPT-4 responses, evaluating safety and agreement with known answers. The findings suggest a majority of responses are deemed safe, but hallucinated citations pose potential harm. Furthermore, the clinicians’ ability to assess agreement varies, emphasizing the need for refinement.

Our Say

While GPT-4 demonstrates promise in aiding clinicians, the study underscores the importance of rigorous evaluation before routine reliance on these technologies. The ongoing analysis aims to delve deeper into the nature of potential harm, the root causes of assessment challenges, and the impact of further prompt engineering on answer quality. The call for calibrated uncertainty estimates for low-confidence answers echoes the necessity for continuous refinement. With better training over time, such AI models may be able to regain their status in healthcare assistance.

K.C. Sabreena Basheer

Sabreena is a GenAI enthusiast and tech editor who's passionate about documenting the latest advancements that shape the world. She's currently exploring the world of AI and Data Science as the Manager of Content & Growth at Analytics Vidhya.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Stanford Doctors Deem GPT-4 Unfit for Medical Assistance

GPT-4 in Medicine

Foundation Models in Healthcare

Safety and Usefulness Analysis

Clinician Review and Reliability

Our Say

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Stanford Doctors Deem GPT-4 Unfit for Medical Assistance

GPT-4 in Medicine

Foundation Models in Healthcare

Safety and Usefulness Analysis

Clinician Review and Reliability

Our Say

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques