What This Skill Does

The Windows Skills package provides a robust suite of tools for desktop automation, specifically designed for the OpenClaw agent. This skill enables the agent to interact with the Windows operating system by performing screen captures, extracting text from images using Tesseract OCR, and identifying UI elements via advanced image recognition. By utilizing libraries such as mss, pyautogui, and opencv, the agent can programmatically find buttons, menus, and text fields, effectively mimicking human behavior to navigate software that lacks traditional APIs.

Installation

To integrate this skill into your environment, use the OpenClaw command-line interface: clawhub install openclaw/skills/skills/civen-cn/windows-skills

Ensure you have the core Python dependencies installed via pip: pip install mss pytesseract pillow pyautogui opencv-python numpy. Additionally, you must download and install the Tesseract OCR engine binary on your Windows machine and ensure it is accessible in your system PATH variables for the OCR module to function correctly.

Use Cases

This skill is ideal for automating legacy desktop applications that do not offer programmatic control. It can be used for data entry automation, where the agent reads forms via OCR and clicks the appropriate inputs. It is also perfect for periodic screenshot reporting, monitoring application status by verifying UI state, or automating repetitive workflows like opening software, clicking specific navigation tabs, and extracting information into reports.

Example Prompts

"Take a screenshot of the Notepad window and extract all the text currently written inside it."
"Locate the 'Submit' button on the current active screen using the provided template image and click it."
"Monitor the background application and send me a notification if the alert icon appears on the toolbar."

Tips & Limitations

Success with image-based automation depends heavily on screen consistency. Ensure that resolution and window scaling remain identical to the settings used when creating template images. OCR performance is highly sensitive to font clarity and image resolution; high-contrast images perform significantly better. Always include error handling in your agent logic to manage scenarios where the agent cannot locate an element, preventing the script from hanging during automation tasks.

Windows Skills

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)

Related Skills

model-verifier