What This Skill Does

The claw-mouse skill acts as an interface between an AI agent and your Linux desktop environment. By wrapping xdotool and scrot, it allows the agent to visually perceive the screen and physically interact with GUI applications as a human would. It is designed for iterative 'vision loops' where the agent observes the desktop state via screenshot, makes a tactical decision to interact with an element, and confirms the action before moving to the next task. This is the cornerstone of automation for legacy software or web interfaces that lack dedicated APIs.

Installation

To install this skill, use the command: clawhub install openclaw/skills/skills/rylena/claw-mouse. Before the skill can function, your system must have the necessary X11 dependencies installed. On Debian-based systems, run sudo apt-get install -y xdotool scrot. Ensure your user has permissions to interact with the current X server. If you are operating in a headless or automated environment, you must explicitly pass the --display and --xauthority flags or ensure the environment variables are correctly mapped for the agent's process.

Use Cases

This skill is ideal for automating non-API-friendly workflows. Common use cases include: interacting with proprietary desktop applications that lack automation hooks, performing end-to-end testing on GUI-based software, filling out complex forms in offline software, and automating repetitive UI clicking tasks that do not have keyboard shortcuts. It is particularly powerful for cross-application workflows where an agent must move data between a local GUI app and a web browser.

Example Prompts

"Take a screenshot, locate the 'Export' button, click it, and then type 'report_final' to save the file."
"Look at the current window list, activate the application labeled 'LibreOffice', and send a Ctrl+S shortcut to save progress."
"Continuously monitor the screen for a notification popup; if one appears at coordinates (100, 100), click the 'Dismiss' button immediately."

Tips & Limitations

This skill is strictly limited to Linux environments running X11; it will not work on Wayland-only systems. Because the agent interacts with your active display, it can inadvertently disrupt your workflow if not used carefully. Always ensure the agent is configured to interact with the correct window or display context. When automating, provide explicit coordinates or descriptive window names to reduce error rates. For critical tasks, maintain a 'human-in-the-loop' verify-before-execute pattern to prevent unexpected behavior.

claw-mouse

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)