ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified ai models Safety 4/5

computer-vision-expert

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

Why use this skill?

Master SOTA computer vision with the computer-vision-expert skill. Deploy YOLO26, SAM 3, and VLM pipelines for real-time detection and 3D spatial intelligence.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/zorrong/computer-vision-expert
Or

What This Skill Does

The computer-vision-expert skill provides OpenClaw agents with state-of-the-art visual perception capabilities, leveraging the 2026 industry-standard suite of models. It serves as an architectural bridge between high-performance real-time detection (YOLO26) and semantic visual reasoning (SAM 3 and VLMs). This skill enables your agent to perform complex computer vision tasks such as zero-shot segmentation, 3D scene reconstruction, and conversational image analysis without requiring extensive custom model training. By utilizing advanced NMS-free architectures and vision-language integration, this skill simplifies the deployment of industrial-grade visual pipelines on edge devices, enabling autonomous systems to "see" and understand their environment with unprecedented accuracy.

Installation

To integrate this skill into your agent, run the following command in your terminal: clawhub install openclaw/skills/skills/zorrong/computer-vision-expert

Use Cases

  • Autonomous Industrial Inspection: Deploy YOLO26 to rapidly detect defects on a conveyor belt, followed by SAM 3 to generate precise masks for automated measuring.
  • Spatial Awareness & Robotics: Use Depth Anything V2 combined with Visual SLAM for real-time obstacle avoidance and navigation in unknown indoor environments.
  • Automated Data Extraction: Utilize Qwen2-VL or PaliGemma 2 to perform visual question answering on complex engineering schematics or logistics manifests to convert visual data into structured JSON formats.
  • Edge Optimization: Transition from heavy, research-focused models to efficient ONNX-exported architectures optimized for TensorRT and NPU deployment.

Example Prompts

  1. "Analyze this live camera feed and identify all safety hazards; output the coordinates and a bounding box for each object detected by YOLO26."
  2. "Use SAM 3 to create a precise mask of the blue container in the foreground, then perform a 3D reconstruction of the area within that segment."
  3. "Look at these architectural blueprints and answer the following: How many structural load-bearing points are clearly visible and what is their approximate spatial relation to the main entry?"

Tips & Limitations

  • Performance Strategy: Always pair YOLO26 with SAM 3; use the former for broad-sweep candidate detection and the latter for pixel-perfect refinement to maintain high FPS.
  • Resource Management: While YOLO26 is optimized for edge devices, SAM 3 and advanced VLMs can be compute-intensive. Ensure your local hardware or cloud inference endpoint supports CUDA acceleration for optimal latency.
  • Calibration: For accurate 3D reconstruction, ensure your camera extrinsic and intrinsic matrices are calibrated using the provided Sub-pixel Calibration utility before attempting depth estimation tasks.
  • Data Privacy: Be mindful of local regulations when using visual grounding models on personally identifiable information.

Metadata

Author@zorrong
Stars879
Views1
Updated2026-02-11
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-zorrong-computer-vision-expert": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#computer-vision#yolo26#sam3#vlm#robotics
Safety Score: 4/5

Flags: code-execution