desktop-agent-ops
Execute cross-platform desktop tasks through a packaged desktop automation skill that guides the main agent to observe the screen, focus apps and windows, call helper scripts for screenshots and input actions, verify each step, clean up task context, and only escalate to multi-agent collaboration when tasks become clearly multi-window or multi-app. Use when the user wants desktop GUI control, native app operation, window focus, screenshots, click and type flows, or cross-platform desktop workflows on macOS, Windows, or Linux.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/appergb/desktop-agent-opsWhat This Skill Does
The desktop-agent-ops skill is the definitive automation engine for OpenClaw, enabling your agent to interact with desktop GUI applications across macOS, Windows, and Linux. Unlike basic scripting tools, this skill implements a rigorous, verification-first control loop that mirrors human interaction. It handles the low-level complexities of platform detection, dependency management, and OS permission requirements, ensuring that the agent can reliably focus windows, perform OCR-based targeting, and execute precise UI interactions.
By utilizing a window-scoped approach rather than full-screen overlays, this skill reduces error rates and prevents collateral clicks. It enforces a strict observability cycle: every action is preceded by a state verification (getting window bounds and capturing the window) and followed by a confirmation step to ensure the UI reacted as intended. This makes it an enterprise-grade solution for managing legacy desktop software, complex system configuration panels, or any non-web interface that lacks a native API.
Installation
To install, run the following command in your terminal:
clawhub install openclaw/skills/skills/appergb/desktop-agent-ops
Upon first execution, the skill triggers a mandatory auto-setup gate. This script (first_run_setup.py) performs a comprehensive check of your environment, installing necessary tools like cliclick and tesseract, verifying screen recording and accessibility permissions, and setting up a dedicated virtual environment. Once configured, simply export the path provided by the output to your environment variables to begin operations.
Use Cases
- Legacy Desktop Software: Automating data entry into local accounting or proprietary database applications that lack web frontends.
- System Administration: Interacting with OS-level settings windows to toggle features or verify status in a cross-platform manner.
- Automated QA Testing: Clicking through multi-step desktop installation wizards or software configuration flows.
- Cross-Platform Workflows: Coordinating local file manipulation via native GUI tools while managing terminal processes.
Example Prompts
- "Open the Calculator app, perform a calculation of 543 times 12, and type the result into the text file currently open in Notepad."
- "Find the Preferences window for the Slack desktop app, navigate to the Advanced tab, and verify that the 'Download logs' option is toggled on."
- "Open the Finder on macOS, navigate to the Documents folder, and select the latest PDF file to rename it to 'Archive_Report.pdf'."
Tips & Limitations
- Verification is Key: Always define clear success criteria for your steps. The agent works best when it can see a distinct change in the UI.
- Window Focus: Ensure the target application is installed and accessible. The agent cannot interact with windows that are not responsive or hidden behind system-level security prompts.
- Permissions: Ensure your terminal or agent process has 'Accessibility' and 'Screen Recording' permissions granted in your OS settings, or the auto-setup gate will fail.
- Complexity: This skill is optimized for single-app workflows. If your task spans more than two applications, consider using a multi-agent collaboration approach to maintain stability.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-appergb-desktop-agent-ops": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution