ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified productivity Safety 3/5

screen-vision

macOS Local OCR & Automation Tool using Vision Framework. Zero token cost for screen understanding.

Why use this skill?

Enhance OpenClaw with Screen-Vision, a local macOS OCR tool for screen reading and UI automation. Get zero-cost, high-speed visual insights.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ls18166407597-design/screen-vision
Or

What This Skill Does

The screen-vision skill acts as the visual cortex for your OpenClaw agent, utilizing the native macOS Vision framework to perform high-speed Optical Character Recognition (OCR) directly on your local machine. Unlike cloud-based vision models that require uploading screenshots to remote servers, this tool processes image data locally, ensuring zero token costs and complete data privacy. It doesn't just read text; it maps the precise [X, Y] coordinates of every element on your screen, enabling the AI to identify button labels, status indicators, and input fields. By bridging the gap between raw pixel data and actionable metadata, it allows your agent to "see" and interact with any application, regardless of whether it provides an API or accessibility tree.

Installation

To integrate this skill, run the following command in your OpenClaw terminal: clawhub install openclaw/skills/skills/ls18166407597-design/screen-vision. After installation, you must grant the necessary permissions for the tool to function. Navigate to System Settings -> Privacy & Security -> Screen Recording and ensure your terminal/OpenClaw app is enabled. Repeat this process under the Accessibility section to allow the agent to execute mouse clicks and keyboard inputs based on the coordinates it retrieves.

Use Cases

This skill is perfect for automating legacy software, non-standard UI interfaces like Electron-based desktop apps, or professional creative tools that lack automation interfaces. Use it to monitor real-time data changes in dashboards, such as tracking stock price movements, server health alerts, or progress bars in long-running renders. It is also highly effective for workflow orchestration, where the agent needs to move data from a non-copyable UI element into a spreadsheet or messaging app.

Example Prompts

  1. "Scan the current window for the 'Export' button and click it once you find it."
  2. "Monitor the progress bar in my rendering software and notify me in Slack when it reaches 100%."
  3. "Read the balance displayed in my crypto desktop wallet and save it to a local text file."

Tips & Limitations

For best results, ensure your screen scaling is set to standard. Because this tool relies on pixel recognition, dynamic UI changes or rapid animations may occasionally cause detection latency. Always ensure your desktop resolution remains consistent while the agent is running automated tasks to prevent coordinate offsets. While highly efficient, it cannot "see" through overlapping windows, so it is recommended to keep the target application in the foreground during execution.

Metadata

Stars1601
Views2
Updated2026-02-27
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ls18166407597-design-screen-vision": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ocr#automation#macos#vision#screen-reading
Safety Score: 3/5

Flags: file-read, file-write, code-execution