ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

agent-desktop

Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard). Covers 54 commands across observation, interaction, keyboard/mouse, app lifecycle, clipboard, and wait. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports macOS (Phase 1), with Windows and Linux planned.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/darryek/aoto
Or

agent-desktop

CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.

Core principle: agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.

Installation

npm install -g agent-desktop
# or
bun install -g --trust agent-desktop

Requires macOS 12+ with Accessibility permission granted to your terminal.

Reference Files

Detailed documentation is split into focused reference files. Read them as needed:

ReferenceContents
references/commands-observation.mdsnapshot, find, get, is, screenshot, list-surfaces — all flags, output examples
references/commands-interaction.mdclick, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command
references/commands-system.mdlaunch, close, windows, clipboard, wait, batch, status, permissions, version
references/workflows.md12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns
references/macos.mdmacOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting

The Observe-Act Loop

Every automation follows this pattern:

1. OBSERVE  → agent-desktop snapshot --app "App Name" -i
2. REASON   → Parse JSON, find target element by ref (@e1, @e2...)
3. ACT      → agent-desktop click @e5  (or type, select, toggle...)
4. VERIFY   → agent-desktop snapshot again to confirm state change
5. REPEAT   → Continue until task is complete

Always snapshot before acting. Refs are snapshot-scoped and become stale after UI changes.

Ref System

  • Refs assigned depth-first: @e1, @e2, @e3...
  • Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
  • Static text, groups, containers remain in tree for context but have no ref
  • Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
  • After any action that changes UI, run snapshot again for fresh refs

JSON Output Contract

Every command returns a JSON envelope on stdout:

Success: { "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } } Error: { "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }

Exit codes: 0 success, 1 structured error, 2 argument error.

Error Codes

Metadata

Author@darryek
Stars3376
Views0
Updated2026-03-24
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-darryek-aoto": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#desktop-automation#accessibility#ai-agent#gui-automation#cli
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.