claw-mouse
Control a Linux X11 desktop by taking screenshots and moving/clicking/typing via xdotool + scrot.
Why use this skill?
Automate Linux X11 desktops using the claw-mouse AI skill. Enables vision loops, GUI clicking, and keyboard input for advanced agent workflows.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/rylena/claw-mouseWhat This Skill Does
The claw-mouse skill acts as an interface between an AI agent and your Linux desktop environment. By wrapping xdotool and scrot, it allows the agent to visually perceive the screen and physically interact with GUI applications as a human would. It is designed for iterative 'vision loops' where the agent observes the desktop state via screenshot, makes a tactical decision to interact with an element, and confirms the action before moving to the next task. This is the cornerstone of automation for legacy software or web interfaces that lack dedicated APIs.
Installation
To install this skill, use the command: clawhub install openclaw/skills/skills/rylena/claw-mouse. Before the skill can function, your system must have the necessary X11 dependencies installed. On Debian-based systems, run sudo apt-get install -y xdotool scrot. Ensure your user has permissions to interact with the current X server. If you are operating in a headless or automated environment, you must explicitly pass the --display and --xauthority flags or ensure the environment variables are correctly mapped for the agent's process.
Use Cases
This skill is ideal for automating non-API-friendly workflows. Common use cases include: interacting with proprietary desktop applications that lack automation hooks, performing end-to-end testing on GUI-based software, filling out complex forms in offline software, and automating repetitive UI clicking tasks that do not have keyboard shortcuts. It is particularly powerful for cross-application workflows where an agent must move data between a local GUI app and a web browser.
Example Prompts
- "Take a screenshot, locate the 'Export' button, click it, and then type 'report_final' to save the file."
- "Look at the current window list, activate the application labeled 'LibreOffice', and send a Ctrl+S shortcut to save progress."
- "Continuously monitor the screen for a notification popup; if one appears at coordinates (100, 100), click the 'Dismiss' button immediately."
Tips & Limitations
This skill is strictly limited to Linux environments running X11; it will not work on Wayland-only systems. Because the agent interacts with your active display, it can inadvertently disrupt your workflow if not used carefully. Always ensure the agent is configured to interact with the correct window or display context. When automating, provide explicit coordinates or descriptive window names to reduce error rates. For critical tasks, maintain a 'human-in-the-loop' verify-before-execute pattern to prevent unexpected behavior.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-rylena-claw-mouse": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution