Official Verified

surfagent-perception

Agent vision for web pages — scene summaries, attention-ranked elements, annotated screenshots, and state diffing via SurfAgent's perception engine.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/agentossoftware/surfagent-perception

Download Source Code (.zip)

SurfAgent Perception — Agent Vision Skill

How to see, understand, and verify web pages through SurfAgent's perception engine.

What This Is

SurfAgent Perception gives you human-like page understanding in ~200 tokens instead of parsing a 50K-token DOM. Three MCP tools, one workflow loop.

Without perception: You get a raw DOM dump or a dumb screenshot. You have to figure out what's on the page yourself.

With perception: You get a scene summary, ranked interactive elements, spatial clusters, viewport state, and optionally an annotated screenshot with numbered bounding boxes + a legend mapping each number to a ref.

Requires: SurfAgent daemon running (port 7201) with a managed Chrome instance (port 9222).

The Three Tools

`surf_perceive` — Your Primary Eyes

The main tool. Call this to understand what's on screen.

Parameters:

Param	Type	Default	Description
`tabId`	string	active tab	Target a specific tab
`since`	string	—	State token from a previous call. Includes delta of what changed
`maxAnnotations`	number	15	How many elements to rank (1-50)
`annotate`	boolean	false	Include annotated screenshot with numbered bounding boxes

Returns:

Scene summary — one-liner + top 5 ranked actions + state notes (blockers, modals, forms, auth, scroll %)
Viewport info — scroll position, fold split, document dimensions
Top elements — attention-ranked, with refs for clicking/typing
Clusters — semantic groups of related elements (nav cluster, form cluster, etc.)
State token — pass as since next time to get delta
Annotated screenshot + legend (if annotate: true)

When to use: Start of any page interaction. After navigation. When you need to understand the page before acting.

`surf_annotate` — Quick Visual Reference

Lighter than surf_perceive. Just the annotated screenshot + legend, no scene analysis.

Parameters:

Param	Type	Default	Description
`tabId`	string	active tab	Target a specific tab
`maxAnnotations`	number	15	How many elements to annotate (1-50)

Returns:

Annotated screenshot (base64 PNG) with numbered colored bounding boxes
Legend mapping each number to element ref, role, location, and status

When to use: When you already know the page context but need to identify specific elements visually. Good for "which button do I click?" scenarios.

`surf_scene_diff` — What Changed?

Compare current state to a previous state token. Answers: "did my action work?"

Parameters:

Param	Type	Required	Description
`since`	string	✅	State token from a previous `surf_perceive` call
`tabId`	string	—	Target a specific tab

Read Full Documentation on GitHub

Metadata

Author@agentossoftware

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-agentossoftware-surfagent-perception": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

surfagent-browser

Control a real Chrome browser from your AI agent — navigate, click, type, fill forms, extract content, manage tabs, and automate workflows via SurfAgent's REST API.

agentossoftware 4473

Agentos

Skill by agentossoftware

agentossoftware 4473

surfagent

Control a real Chrome browser via SurfAgent — navigate, click, type, screenshot, extract data, crawl sites, and automate web workflows. Uses your persistent Chrome profile with real cookies and sessions. Works through SurfAgent's MCP server or direct HTTP API.

agentossoftware 4473

Agentos Sdk

Skill by agentossoftware

agentossoftware 4473

Agentos Mesh

Skill by agentossoftware

agentossoftware 4473

surfagent-perception

Install via CLI (Recommended)

SurfAgent Perception — Agent Vision Skill

What This Is

The Three Tools

surf_perceive — Your Primary Eyes

surf_annotate — Quick Visual Reference

surf_scene_diff — What Changed?

Metadata

Related Skills

surfagent-browser

Agentos

surfagent

Agentos Sdk

Agentos Mesh

`surf_perceive` — Your Primary Eyes

`surf_annotate` — Quick Visual Reference

`surf_scene_diff` — What Changed?