Official Verified

agent-desktop

Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard). Covers 54 commands across observation, interaction, keyboard/mouse, app lifecycle, clipboard, and wait. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports macOS (Phase 1), with Windows and Linux planned.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/darryek/aoto

Download Source Code (.zip)

agent-desktop

CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.

Core principle: agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.

Installation

npm install -g agent-desktop
# or
bun install -g --trust agent-desktop

Requires macOS 12+ with Accessibility permission granted to your terminal.

Reference Files

Detailed documentation is split into focused reference files. Read them as needed:

Reference	Contents
`references/commands-observation.md`	snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples
`references/commands-interaction.md`	click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command
`references/commands-system.md`	launch, close, windows, clipboard, wait, batch, status, permissions, version
`references/workflows.md`	12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns
`references/macos.md`	macOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting

The Observe-Act Loop

Every automation follows this pattern:

1. OBSERVE  → agent-desktop snapshot --app "App Name" -i
2. REASON   → Parse JSON, find target element by ref (@e1, @e2...)
3. ACT      → agent-desktop click @e5  (or type, select, toggle...)
4. VERIFY   → agent-desktop snapshot again to confirm state change
5. REPEAT   → Continue until task is complete

Always snapshot before acting. Refs are snapshot-scoped and become stale after UI changes.

Ref System

Refs assigned depth-first: @e1, @e2, @e3...
Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
Static text, groups, containers remain in tree for context but have no ref
Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
After any action that changes UI, run snapshot again for fresh refs

JSON Output Contract

Every command returns a JSON envelope on stdout:

Success: { "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } } Error: { "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }

Exit codes: 0 success, 1 structured error, 2 argument error.

Error Codes

Read Full Documentation on GitHub

Metadata

Author@darryek

Stars3376

Updated2026-03-24

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-darryek-aoto": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Related Skills

Claude Code CLI for OpenClaw

Install, authenticate, and use Claude Code CLI as a native coding tool for any OpenClaw agent system.

asif2bd 4473

china-tour

AI-powered tour guide with backend API and offline fallback. Personalized routes, photo spots, cultural narration for China's scenic spots. Bilingual support. 中国景区智能导览助手，支持后端API增强与离线备份，个性化路线推荐、拍照机位、文化讲解，中英双语。

bitzhuyong 4473

onlyclaw-social-commerce

在只来龙虾平台以龙虾身份自动发帖带货、读取帖子、检索帖子、点赞评论，支持关联商品/店铺/Skill、封面与视频（先上传再发帖），实现 AI Agent 24h 社交电商自动运营

azhangwq-bit 4473

obsidian-cli

Skill for the official Obsidian CLI (v1.12+). Complete vault automation including files, daily notes, search, tasks, tags, properties, links, bookmarks, bases, templates, themes, plugins, sync, publish, workspaces, and developer tools.

adolago 4473

aethercore

AetherCore v3.3.4 - Security-focused final release. High-performance JSON optimization with universal smart indexing for all file types. All security review issues fixed, ready for production.

aetherclawai 4473

agent-desktop

Install via CLI (Recommended)

agent-desktop

Installation

Reference Files

The Observe-Act Loop

Ref System

JSON Output Contract

Error Codes

Metadata

Tags

Related Skills

Claude Code CLI for OpenClaw

china-tour

onlyclaw-social-commerce

obsidian-cli

aethercore