ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified system Safety 2/5

claw-mouse

Control a Linux X11 desktop by taking screenshots and moving/clicking/typing via xdotool + scrot.

Why use this skill?

Automate Linux X11 desktops using the claw-mouse AI skill. Enables vision loops, GUI clicking, and keyboard input for advanced agent workflows.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/rylena/claw-mouse
Or

What This Skill Does

The claw-mouse skill acts as an interface between an AI agent and your Linux desktop environment. By wrapping xdotool and scrot, it allows the agent to visually perceive the screen and physically interact with GUI applications as a human would. It is designed for iterative 'vision loops' where the agent observes the desktop state via screenshot, makes a tactical decision to interact with an element, and confirms the action before moving to the next task. This is the cornerstone of automation for legacy software or web interfaces that lack dedicated APIs.

Installation

To install this skill, use the command: clawhub install openclaw/skills/skills/rylena/claw-mouse. Before the skill can function, your system must have the necessary X11 dependencies installed. On Debian-based systems, run sudo apt-get install -y xdotool scrot. Ensure your user has permissions to interact with the current X server. If you are operating in a headless or automated environment, you must explicitly pass the --display and --xauthority flags or ensure the environment variables are correctly mapped for the agent's process.

Use Cases

This skill is ideal for automating non-API-friendly workflows. Common use cases include: interacting with proprietary desktop applications that lack automation hooks, performing end-to-end testing on GUI-based software, filling out complex forms in offline software, and automating repetitive UI clicking tasks that do not have keyboard shortcuts. It is particularly powerful for cross-application workflows where an agent must move data between a local GUI app and a web browser.

Example Prompts

  1. "Take a screenshot, locate the 'Export' button, click it, and then type 'report_final' to save the file."
  2. "Look at the current window list, activate the application labeled 'LibreOffice', and send a Ctrl+S shortcut to save progress."
  3. "Continuously monitor the screen for a notification popup; if one appears at coordinates (100, 100), click the 'Dismiss' button immediately."

Tips & Limitations

This skill is strictly limited to Linux environments running X11; it will not work on Wayland-only systems. Because the agent interacts with your active display, it can inadvertently disrupt your workflow if not used carefully. Always ensure the agent is configured to interact with the correct window or display context. When automating, provide explicit coordinates or descriptive window names to reduce error rates. For critical tasks, maintain a 'human-in-the-loop' verify-before-execute pattern to prevent unexpected behavior.

Metadata

Author@rylena
Stars1133
Views0
Updated2026-02-18
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-rylena-claw-mouse": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#automation#linux#gui#x11#desktop
Safety Score: 2/5

Flags: file-read, file-write, code-execution