agent-desktop
Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard). Covers 54 commands across observation, interaction, keyboard/mouse, app lifecycle, clipboard, and wait. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports macOS (Phase 1), with Windows and Linux planned.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/darryek/aotoagent-desktop
CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.
Core principle: agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.
Installation
npm install -g agent-desktop
# or
bun install -g --trust agent-desktop
Requires macOS 12+ with Accessibility permission granted to your terminal.
Reference Files
Detailed documentation is split into focused reference files. Read them as needed:
| Reference | Contents |
|---|---|
references/commands-observation.md | snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples |
references/commands-interaction.md | click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command |
references/commands-system.md | launch, close, windows, clipboard, wait, batch, status, permissions, version |
references/workflows.md | 12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns |
references/macos.md | macOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting |
The Observe-Act Loop
Every automation follows this pattern:
1. OBSERVE → agent-desktop snapshot --app "App Name" -i
2. REASON → Parse JSON, find target element by ref (@e1, @e2...)
3. ACT → agent-desktop click @e5 (or type, select, toggle...)
4. VERIFY → agent-desktop snapshot again to confirm state change
5. REPEAT → Continue until task is complete
Always snapshot before acting. Refs are snapshot-scoped and become stale after UI changes.
Ref System
- Refs assigned depth-first:
@e1,@e2,@e3... - Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
- Static text, groups, containers remain in tree for context but have no ref
- Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
- After any action that changes UI, run
snapshotagain for fresh refs
JSON Output Contract
Every command returns a JSON envelope on stdout:
Success: { "version": "1.0", "ok": true, "command": "snapshot", "data": { ... } }
Error: { "version": "1.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }
Exit codes: 0 success, 1 structured error, 2 argument error.
Error Codes
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-darryek-aoto": {
"enabled": true,
"auto_update": true
}
}
}Tags
Related Skills
Claude Code CLI for OpenClaw
Install, authenticate, and use Claude Code CLI as a native coding tool for any OpenClaw agent system.
china-tour
AI-powered tour guide with backend API and offline fallback. Personalized routes, photo spots, cultural narration for China's scenic spots. Bilingual support. 中国景区智能导览助手,支持后端API增强与离线备份,个性化路线推荐、拍照机位、文化讲解,中英双语。
onlyclaw-social-commerce
在只来龙虾平台以龙虾身份自动发帖带货、读取帖子、检索帖子、点赞评论,支持关联商品/店铺/Skill、封面与视频(先上传再发帖),实现 AI Agent 24h 社交电商自动运营
obsidian-cli
Skill for the official Obsidian CLI (v1.12+). Complete vault automation including files, daily notes, search, tasks, tags, properties, links, bookmarks, bases, templates, themes, plugins, sync, publish, workspaces, and developer tools.
aethercore
AetherCore v3.3.4 - Security-focused final release. High-performance JSON optimization with universal smart indexing for all file types. All security review issues fixed, ready for production.