computer-control
Automate desktop GUI workflows via Claude computer use API with screenshot capture and mouse/keyboard control
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/athola/nm-phantom-computer-controlNight Market Skill — ported from claude-night-market/phantom. For the full experience with agents, hooks, and commands, install the Claude Code plugin.
Computer Control Skill
Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions.
When To Use
- Automating GUI-based workflows that lack CLI alternatives
- Testing web applications through visual interaction
- Filling forms, navigating menus, or interacting with desktop apps
- Building automation pipelines that need visual verification
When NOT To Use
- Tasks achievable through CLI or API (no GUI needed)
- Browser automation better served by Playwright or CDP
Architecture
The computer use system has three layers:
- Display Toolkit (
phantom.display) - executes OS-level actions via xdotool/scrot on the real or virtual display - Agent Loop (
phantom.loop) - manages the conversation cycle between Claude API and the display toolkit - CLI (
phantom.cli) - command-line interface for running tasks or checking environment readiness
User Task
|
v
Agent Loop <----> Claude API (beta)
| |
v v
Display Toolkit tool_use responses
| (click, type, screenshot)
v
OS Commands (xdotool, scrot)
|
v
Display (X11 / Xvfb / WSLg)
Quick Start
Check environment
cd plugins/phantom
uv run python -m phantom.cli --check
Run a task
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"
Use in Python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop
result = run_loop(
task="Take a screenshot of the desktop",
api_key="sk-ant-...",
loop_config=LoopConfig(
model="claude-sonnet-4-6",
max_iterations=10,
),
display_config=DisplayConfig(width=1920, height=1080),
)
print(f"Done in {result.iterations} iterations")
print(result.final_text)
API Versions
| Model | Tool Version | Beta Flag |
|---|---|---|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | computer_20251124 | computer-use-2025-11-24 |
| Sonnet 4.5, Haiku 4.5, older | computer_20250124 | computer-use-2025-01-24 |
The resolve_tool_version() function handles this mapping
automatically based on the model name.
Available Actions
All versions:
screenshot- capture displayleft_click- click at[x, y]type- type text stringkey- press key combo (e.g.,ctrl+s)mouse_move- move cursor
Enhanced (20250124+):
scroll- scroll with direction and amountleft_click_drag- drag between coordinatesright_click,middle_click,double_click,triple_clickhold_key- hold key for durationwait- pause between actions
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-athola-nm-phantom-computer-control": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
extract
Analyze a codebase and build a knowledge base of business logic, architecture, data flow, and engineering patterns. The foundation for gauntlet challenges and agent integration
discourse
>- Scan community discussion channels (HN, Lobsters, Reddit, tech blogs) for experience reports and opinions on a topic
synthesize
>- Merge, deduplicate, rank, and format research findings from multiple channels into a coherent report. Use after research agents return their results
workflow-monitor
Detect workflow failures and inefficient patterns, then create GitHub issues for improvement via /fix-workflow
architecture-paradigm-hexagonal
Hexagonal (Ports and Adapters) architecture isolating domain logic from infrastructure