Computer Use
Claude can see your screen, click, type, and operate any application — like a digital coworker.
What is Computer Use?
Computer Use is Claude's ability to control a computer — clicking, typing, scrolling, taking screenshots, and running applications — just like a human would. You give Claude a goal; it sees your screen and takes actions to achieve it.
It's the most powerful and most experimental Claude capability. The use cases are genuinely remarkable: Claude can navigate legacy internal tools with no API, fill multi-page forms, scrape data from sites that block automated bots, QA-test your own UI, and automate repetitive computer workflows that would otherwise require RPA software.
How Computer Use works
The workflow is a loop:
- Claude takes a screenshot of the screen (or a region)
- Claude analyzes what it sees and decides the next action
- Claude executes the action: click(x,y), type("text"), key("Ctrl+C"), scroll(x,y,direction), etc.
- Claude takes another screenshot to see the result
- Repeat until the goal is achieved or Claude asks for help
This loop is powered by Claude's vision capability combined with three Computer Use tools: computer (interact with screen), text_editor (read/write files), and bash (run terminal commands).
Setting up Computer Use via the API
Method 1: Docker quickstart (recommended)
Anthropic provides a Docker image with a full Linux desktop environment (X11) wired up for Computer Use. This is the safest approach — Claude controls an isolated VM, not your real machine.
# Pull and run the reference implementation docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY -v $HOME/.anthropic:/home/user/.anthropic -p 5900:5900 -p 8501:8501 -p 6080:6080 -p 8080:8080 ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Then open http://localhost:8080 in your browser. You'll see a Streamlit UI with a chat interface on the left and a live desktop view on the right. Type a task and watch Claude execute it.
Method 2: API directly (advanced)
import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=[
{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1280,
"display_height_px": 800,
"display_number": 1,
},
{"type": "text_editor_20241022", "name": "str_replace_editor"},
{"type": "bash_20241022", "name": "bash"},
],
messages=[{
"role": "user",
"content": "Open Firefox, go to our internal dashboard at http://localhost:3000, "
"take a screenshot of the Orders section, and save it as orders_screenshot.png"
}],
betas=["computer-use-2024-10-22"],
)
# Process tool use responses in a loop
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
elif block.type == "text":
print(f"Claude: {block.text}")
The tool actions Claude can take
| Action | What it does | Example |
|---|---|---|
screenshot | Capture current screen state | Called automatically before each action |
left_click | Click at coordinates | Click a button, link, or form field |
right_click | Right-click at coordinates | Open context menus |
double_click | Double-click at coordinates | Open files, select words |
type | Type text | Fill form fields |
key | Press keyboard keys | Ctrl+C, Enter, Tab, F5 |
scroll | Scroll in a direction | Scroll pages, dropdown lists |
mouse_move | Move without clicking | Hover for tooltips |
cursor_position | Get current cursor location | Verify position before clicking |
Computer Use via Claude Desktop (Claude in Chrome)
For web-specific automation, Anthropic offers Claude in Chrome — a browser extension where Claude can navigate and interact with websites in your active Chrome tab.
- Install Claude Desktop
- Install the Claude for Chrome extension from the Chrome Web Store
- Open Claude Desktop, go to Settings → Integrations → Chrome
- Enable the Chrome integration
- In Claude Desktop, click the Chrome icon in the message bar
- Type: "Go to amazon.in and find the top-rated laptop under ₹50,000"
Real-world use cases
- Legacy system data extraction — scrape data from internal tools that have no API but have a UI
- QA testing — run through UI test scripts and take screenshots of each step
- Multi-step form automation — fill government forms, HR portals, compliance submissions
- Research workflows — open multiple tabs, read, compare, and summarize findings
- Dashboard monitoring — take scheduled screenshots of dashboards and alert on anomalies
- Software installation — guide Claude through installing and configuring complex tools