ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

surfagent-perception

Agent vision for web pages — scene summaries, attention-ranked elements, annotated screenshots, and state diffing via SurfAgent's perception engine.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/agentossoftware/surfagent-perception
Or

SurfAgent Perception — Agent Vision Skill

How to see, understand, and verify web pages through SurfAgent's perception engine.


What This Is

SurfAgent Perception gives you human-like page understanding in ~200 tokens instead of parsing a 50K-token DOM. Three MCP tools, one workflow loop.

Without perception: You get a raw DOM dump or a dumb screenshot. You have to figure out what's on the page yourself.

With perception: You get a scene summary, ranked interactive elements, spatial clusters, viewport state, and optionally an annotated screenshot with numbered bounding boxes + a legend mapping each number to a ref.

Requires: SurfAgent daemon running (port 7201) with a managed Chrome instance (port 9222).


The Three Tools

surf_perceive — Your Primary Eyes

The main tool. Call this to understand what's on screen.

Parameters:

ParamTypeDefaultDescription
tabIdstringactive tabTarget a specific tab
sincestringState token from a previous call. Includes delta of what changed
maxAnnotationsnumber15How many elements to rank (1-50)
annotatebooleanfalseInclude annotated screenshot with numbered bounding boxes

Returns:

  • Scene summary — one-liner + top 5 ranked actions + state notes (blockers, modals, forms, auth, scroll %)
  • Viewport info — scroll position, fold split, document dimensions
  • Top elements — attention-ranked, with refs for clicking/typing
  • Clusters — semantic groups of related elements (nav cluster, form cluster, etc.)
  • State token — pass as since next time to get delta
  • Annotated screenshot + legend (if annotate: true)

When to use: Start of any page interaction. After navigation. When you need to understand the page before acting.


surf_annotate — Quick Visual Reference

Lighter than surf_perceive. Just the annotated screenshot + legend, no scene analysis.

Parameters:

ParamTypeDefaultDescription
tabIdstringactive tabTarget a specific tab
maxAnnotationsnumber15How many elements to annotate (1-50)

Returns:

  • Annotated screenshot (base64 PNG) with numbered colored bounding boxes
  • Legend mapping each number to element ref, role, location, and status

When to use: When you already know the page context but need to identify specific elements visually. Good for "which button do I click?" scenarios.


surf_scene_diff — What Changed?

Compare current state to a previous state token. Answers: "did my action work?"

Parameters:

ParamTypeRequiredDescription
sincestringState token from a previous surf_perceive call
tabIdstringTarget a specific tab

Metadata

Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-agentossoftware-surfagent-perception": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.