Official Verified

computer-control

Automate desktop GUI workflows via Claude computer use API with screenshot capture and mouse/keyboard control

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/athola/nm-phantom-computer-control

Night Market Skill — ported from claude-night-market/phantom. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Computer Control Skill

Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions.

When To Use

Automating GUI-based workflows that lack CLI alternatives
Testing web applications through visual interaction
Filling forms, navigating menus, or interacting with desktop apps
Building automation pipelines that need visual verification

When NOT To Use

Tasks achievable through CLI or API (no GUI needed)
Browser automation better served by Playwright or CDP

Architecture

The computer use system has three layers:

Display Toolkit (phantom.display) - executes OS-level actions via xdotool/scrot on the real or virtual display
Agent Loop (phantom.loop) - manages the conversation cycle between Claude API and the display toolkit
CLI (phantom.cli) - command-line interface for running tasks or checking environment readiness

User Task
    |
    v
Agent Loop  <---->  Claude API (beta)
    |                   |
    v                   v
Display Toolkit    tool_use responses
    |              (click, type, screenshot)
    v
OS Commands (xdotool, scrot)
    |
    v
Display (X11 / Xvfb / WSLg)

Quick Start

Check environment

cd plugins/phantom
uv run python -m phantom.cli --check

Run a task

export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"

Use in Python

from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop

result = run_loop(
    task="Take a screenshot of the desktop",
    api_key="sk-ant-...",
    loop_config=LoopConfig(
        model="claude-sonnet-4-6",
        max_iterations=10,
    ),
    display_config=DisplayConfig(width=1920, height=1080),
)

print(f"Done in {result.iterations} iterations")
print(result.final_text)

API Versions

Model	Tool Version	Beta Flag
Opus 4.6, Sonnet 4.6, Opus 4.5	`computer_20251124`	`computer-use-2025-11-24`
Sonnet 4.5, Haiku 4.5, older	`computer_20250124`	`computer-use-2025-01-24`

The resolve_tool_version() function handles this mapping automatically based on the model name.

Available Actions

All versions:

screenshot - capture display
left_click - click at [x, y]
type - type text string
key - press key combo (e.g., ctrl+s)
mouse_move - move cursor

Enhanced (20250124+):

scroll - scroll with direction and amount
left_click_drag - drag between coordinates
right_click, middle_click, double_click, triple_click
hold_key - hold key for duration
wait - pause between actions

Read Full Documentation on GitHub

Metadata

Author@athola

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-athola-nm-phantom-computer-control": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

extract

Analyze a codebase and build a knowledge base of business logic, architecture, data flow, and engineering patterns. The foundation for gauntlet challenges and agent integration

athola 4473

discourse

>- Scan community discussion channels (HN, Lobsters, Reddit, tech blogs) for experience reports and opinions on a topic

athola 4473

synthesize

>- Merge, deduplicate, rank, and format research findings from multiple channels into a coherent report. Use after research agents return their results

athola 4473

workflow-monitor

Detect workflow failures and inefficient patterns, then create GitHub issues for improvement via /fix-workflow

athola 4473

architecture-paradigm-hexagonal

Hexagonal (Ports and Adapters) architecture isolating domain logic from infrastructure

athola 4473