Anthropic Computer Use: AI Assistant Taking Over Your Computer

Badrinarayan M Last Updated : 11 Dec, 2024

9 min read

Imagine your AI assistant taking over your mouse and keyboard to navigate a computer just like you would—clicking, typing, and scrolling, all by “looking” at the screen. Anthropic’s latest update introduces this cool capability to their AI model, Claude. It’s in beta testing, but it’s already shaking up how AI can interact with software. They’re keeping safety in mind while exploring how this tech could transform productivity.

Why is Anthropic Focusing on Computer Use for AI?
Teaching AI to Think and Act on Screens
Balancing Innovation with Safety
How Anthropic Computer Use Works?
Capabilities of Anthropic Computer Use
Limitations and Challenges Anthropic Computer Use
Exploring Computer Use with Claude: Methods and Examples
Using the Messages API for Computer Use
Reference Implementation Using a Docker Container
Setting Up Computer Use with Docker
Let’s try using Computer Use
Using the Anthropic Quickstarts App
Using Replit for Quick Deployment
Use Cases of Computer Use
Conclusion
Frequently Asked Questions

Why is Anthropic Focusing on Computer Use for AI?

Well, think about it: most of our daily tasks—whether at work or play—happen on a computer. By teaching AI to use software like a person does, we unlock endless possibilities. No more clunky custom tools; the AI could navigate any program seamlessly, like a digital assistant with superpowers.

This marks a big leap forward, following AI’s strides in logical thinking and image recognition. It’s not just about doing things better—it’s about doing what wasn’t possible before!

Teaching AI to Think and Act on Screens

Developing Claude’s computer use skills was a mix of creativity and technical rigour. By leveraging its existing multimodal capabilities, researchers trained Claude to “see” and interpret computer screens, translating visual data into actionable insights. The key challenge? Teaching it to measure pixel distances accurately for cursor movements, is similar to solving deceptively tricky logic puzzles. Starting with simple software like text editors and calculators, Claude quickly generalized these skills, surprising researchers with its ability to break down tasks into logical steps and even self-correct when needed.

While training wasn’t straightforward, the payoff was significant. Claude can now perform actions on a computer in response to visual prompts, achieving state-of-the-art results on evaluations like OSWorld. Though its 14.9% score is far from human-level accuracy (70-75%), it’s double that of the nearest competitor. This technical achievement lays the foundation for broader applications, bringing AI closer to seamlessly integrating with everyday software.

Balancing Innovation with Safety

Every AI breakthrough comes with its safety challenges, and Claude’s computer-use skills are no exception. While these abilities don’t fundamentally increase the AI’s cognitive power, they lower the barrier for real-world applications. Safety evaluations show that Claude remains at AI Safety Level 2, meaning no extra safeguards are currently needed. However, as future models grow more advanced, these skills might amplify risks, making it crucial to address vulnerabilities—like “prompt injection” attacks—early.

Anthropic’s Trust & Safety teams are proactively monitoring risks, such as misuse during events like elections, and have implemented measures like abuse detection and task nudging. Developers using Claude’s new skills are encouraged to follow best practices to minimize risks while the technology remains in public beta. Data privacy is also a priority; by default, Claude isn’t trained on user-submitted data or screenshots.

Computer Use is a groundbreaking feature in Anthropic’s Claude AI, enabling it to interact with computer systems programmatically, mimicking actions that a person would typically perform with a monitor and mouse. These actions range from accessing files and filling forms to automating web scraping and analyzing data. Here’s how it works, the workflow, its capabilities, and its limitations.

Also read: Claude 3.5 Sonnet : Anthropic’s Smartest, Fastest, and Most Personable Model

How Anthropic Computer Use Works?

1. Providing Tools and User Prompt

To enable computer use:

Add tools: Include Anthropic-defined computer use tools in your API request.
Craft a user prompt: For example, “Save a picture of a cat to my desktop” or “Fill out this form based on given information.”

The system interprets these prompts and checks whether the provided tools can help achieve the user’s goal.

2. Decision to Use a Tool

Once the system receives a prompt:

Claude loads the stored tools and evaluates if a tool fits the task.
If suitable, Claude creates a tool use request (a formatted API call).
The API response contains a stop_reason field marked as tool_use, signaling that Claude intends to perform a tool action.

3. Executing the Tool and Returning Results

This step involves:

Extracting the tool name and input from Claude’s request.
Using the tool on a container or virtual machine to execute the action.
Returning the result to Claude using a tool_result content block in a new user message.

4. Iterative Problem-Solving

Claude operates in a loop:

Analyzing the results of the tool.
Deciding whether further tool use is needed.
Repeating the tool-use request until the task is completed.

Once the task is done, Claude generates a final text response for the user. This iterative process is similar to GPT’s chain-of-thought reasoning, where Claude continually references its previous actions and results to refine the solution.

Capabilities of Anthropic Computer Use

Claude’s computer use feature enables it to handle tasks like:

File Manipulation:
- Accessing and editing Excel files.
- Saving screenshots or specific data to the system.
Form Automation:
- Filling out forms with provided user information.
- Automating repetitive data-entry tasks.
Web Scraping with Natural Language:
- Extracting information from websites.
- Leveraging natural language for precise data acquisition.

Essentially, Claude mimics human-like interactions with a computer system, offering robust automation and assistance.

Limitations and Challenges Anthropic Computer Use

While powerful, computer use is not always perfect. For instance:

Unintended Actions: During a coding task, Claude might decide to perform irrelevant tasks (e.g., searching for a park instead of solving the coding issue). This could lead to delays and inefficiencies.
Infinite Loops: In some cases, Claude might enter an infinite loop of taking screenshots, analyzing, and repeating actions without reaching a resolution. This loop may inadvertently consume resources and time.
Risk Scenarios: Erroneous tool actions during sensitive operations (e.g., financial management) could result in serious consequences, such as mismanaged funds.

Exploring Computer Use with Claude: Methods and Examples

The documentation on computer use tools provides a detailed overview of enabling computer use features using various methods, including the Messages API. Below, we elaborate on these approaches and the resources available for implementation.

Using the Messages API for Computer Use

The Messages API facilitates communication between your application and Claude. By enabling computer use tools, developers can:

Programmatically send instructions.
Enable Claude to use computational resources.
Allow secure and controlled operations.

The API lets you specify permissions, inputs, and environments, ensuring that the AI can only interact with the predefined computational tools.

Code:

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(

    model="claude-3-5-sonnet-20241022",

    max_tokens=1024,

    tools=[

        {

          "type": "computer_20241022",

          "name": "computer",

          "display_width_px": 1024,

          "display_height_px": 768,

          "display_number": 1,

        },

        {

          "type": "text_editor_20241022",

          "name": "str_replace_editor"

        },

        {

          "type": "bash_20241022",

          "name": "bash"

        }

    ],

    messages=[{"role": "user", "content": "Save a picture of a cat to my desktop."}],

    betas=["computer-use-2024-10-22"],

)

print(response)

Reference Implementation Using a Docker Container

A Docker container simplifies the setup process by encapsulating the required environment for computer use. This approach allows you to replicate a consistent configuration for development and testing. This is the recommended way by Anthropic as well.

Also read: Uncovering the Secrets of Anthropic’s Claude 3 API Lineup

Setting Up Computer Use with Docker

To try out the Anthropic Computer Use feature via Docker, follow this step-by-step guide. This method provides a consistent and portable environment for utilizing computer use tools.

Step 1: Install Docker

If you don’t have Docker installed, start by installing it. Refer to the official documentation for installation instructions: Docker Installation Guide.

Key Prerequisites for Docker:

Virtualization Support: Ensure that your system supports virtualization (e.g., Intel VT-x or AMD-V) and that it is enabled in the BIOS/UEFI.
Windows Subsystem for Linux (WSL): On Windows, you need WSL2 for Docker to work. Install WSL following Microsoft’s WSL guide.
Hyper-V: Enable Hyper-V for virtualization support on Windows systems.

Step 2: Obtain an Anthropic API Key

To interact with Anthropic’s computer use tools, you’ll need an API key.

Go to the Anthropic Console: Get Your API Key.
Log in to your account and generate a new API key.
Complete the billing setup by purchasing some credits.

Note: Computer use can consume credits rapidly, so monitor usage closely to avoid unexpected charges.

Step 3: Set Up the Docker Container

With Docker installed and the Anthropic API key in hand, set up the container.

Command to Set the API Key:

set ANTHROPIC_API_KEY=ENTER_API_KEY_HERE

Replace ENTER_API_KEY_HERE with your actual API key.

Verify the API Key:

echo %ANTHROPIC_API_KEY%

This command displays the stored key to ensure it’s correctly set.

Run the Docker Container:

The following command will:

Download the Docker container (on the first run).
Start the container with the appropriate configuration.

docker run ^

-e ANTHROPIC_API_KEY=%ANTHROPIC_API_KEY% ^

-v %USERPROFILE%/.anthropic:/home/computeruse/.anthropic ^

-p 5900:5900 ^

-p 8501:8501 ^

-p 6080:6080 ^

-p 8080:8080 ^

-it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Explanation of the Flags:

-e ANTHROPIC_API_KEY: Passes the API key as an environment variable to the container.
-v %USERPROFILE%/.anthropic:/home/computeruse/.anthropic: Mounts a local directory to the container for persistent storage.
-p [PORT]:[PORT]: Maps ports for interaction with the container (e.g., VNC, HTTP, etc.).
-it: Runs the container in interactive mode.

On subsequent runs, the pre-downloaded container will be used, saving time.

Step 4: Access the Application

Once the container is running:

Open your browser and navigate to localhost on one of the mapped ports. (you will even get the link for localhost from the terminal as well)
Follow the instructions provided in the application interface to start using the computer use tools. Check this out on how to access the container.

Monitoring Usage

Keep track of API credit consumption via the Anthropic Console.
Log container activities to understand resource utilization and optimize tool usage.

By following this setup, you’ll have a fully functional environment for experimenting with Anthropic’s computer use tools via Docker.

Let’s try using Computer Use

Check this out to optimize your prompt when using computer use tools.

Prompt used: Give me a summary of AI Agent Pioneer Program from Analytics Vidhya. Give me a 2 paragraph summary. After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking: “I have evaluated step X…” If not correct, try again. Only when you confirm a step was executed correctly should you move on to the next one.

Final Output

Here is a recorded video showcasing the entire process performed using Anthropic’s Computer Use feature.

Observing Decision-Making in Computer Use

During the execution of the Computer Use functionality, as demonstrated in the example video, a situation arose where a popup appeared requesting permission to allow notifications. Remarkably, the model autonomously decided not to allow notifications, showcasing its ability to make decisions and navigate through potential obstacles effectively.

This example highlights the high potential of the Computer Use feature to handle unexpected scenarios during task automation, maintaining focus on the primary objective while adapting to dynamic interactions in the user interface.

Using the Anthropic Quickstarts App

The Anthropic Quickstarts repository includes a demo application for computer use. This app is a straightforward alternative to the Docker container implementation, offering the same features but in a more app-centric format.

Advantages:

Lightweight: Eliminates the need for container orchestration.
Extensible: Developers can modify the app to suit their specific use cases.

The demo application mirrors the Docker container functionality, making it an excellent choice for those who prefer app-based implementations.

Using Replit for Quick Deployment

Replit is an online development environment that supports deploying and experimenting with Claude’s computer use capabilities. It is particularly useful for developers looking for a cloud-based solution.

Benefits:

Instant Setup: No need to install software locally; everything runs in the browser.
Interactive Development: Test and tweak your implementation in real-time.
Collaboration: Share your projects with other developers seamlessly.

The Replit project includes a prebuilt environment and is an excellent way to explore Claude’s computer use features without setting up a local development environment.

Use Cases of Computer Use

Claude | Computer use for coding

Claude | Computer use for orchestrating tasks

Conclusion

Anthropic’s Computer Use demonstrates a groundbreaking step in AI-driven automation by seamlessly performing complex tasks like file management, form filling, and web scraping. Its ability to mimic human interaction, adapt to unexpected scenarios, and handle obstacles, such as dismissing popups, underscores its immense potential for practical applications. The use of Docker containers and platforms like Replit ensures that developers can easily deploy and experiment with this technology.

However, while its capabilities are impressive, challenges such as occasional inefficiencies and unintended actions highlight the need for careful implementation and monitoring. With continuous advancements, Computer Use has the potential to redefine task automation, offering a glimpse into a future where AI becomes an indispensable part of everyday computing.

Also if you looking to build AI agents then explore: the Agentic AI Pioneer Program.

Frequently Asked Questions

Q1. What is Anthropic’s Computer Use?

Ans. Anthropic Computer Use enables AI to interact with computer systems, performing tasks like file manipulation, form filling, and web scraping, similar to how a person uses a monitor and mouse.

Q2. What are its primary capabilities?

Ans. It can handle tasks such as accessing and editing files, automating repetitive form filling, and extracting web data using natural language commands.

Q3. What are the limitations of this feature?

Ans. Challenges include potential inefficiencies, unintended actions, and resource-heavy operations, which require careful monitoring to avoid issues like infinite loops.

Q4. Is it safe to use for sensitive tasks?

Ans. While it includes safety features, users should exercise caution during critical tasks to prevent undesired actions, such as mismanaging sensitive data.

Badrinarayan M

Data science Trainee at Analytics Vidhya, specializing in ML, DL and Gen AI. Dedicated to sharing insights through articles on these subjects. Eager to learn and contribute to the field's advancements. Passionate about leveraging data to solve complex problems and drive innovation.

Advanced AI Agents

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Anthropic Computer Use: AI Assistant Taking Over Your Computer

Table of contents

Why is Anthropic Focusing on Computer Use for AI?

Teaching AI to Think and Act on Screens

Balancing Innovation with Safety

How Anthropic Computer Use Works?

1. Providing Tools and User Prompt

2. Decision to Use a Tool

3. Executing the Tool and Returning Results

4. Iterative Problem-Solving

Capabilities of Anthropic Computer Use

Limitations and Challenges Anthropic Computer Use

Exploring Computer Use with Claude: Methods and Examples

Using the Messages API for Computer Use

Reference Implementation Using a Docker Container

Setting Up Computer Use with Docker

Step 1: Install Docker

Step 2: Obtain an Anthropic API Key

Step 3: Set Up the Docker Container

Command to Set the API Key:

Verify the API Key:

Run the Docker Container:

Step 4: Access the Application

Monitoring Usage

Let’s try using Computer Use

Final Output

Observing Decision-Making in Computer Use

Using the Anthropic Quickstarts App

Advantages:

Using Replit for Quick Deployment

Benefits:

Use Cases of Computer Use

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS