Back to Registry
View Author Profile
Official Verified
Image Vision
Skill by cntuang
skill-install — Terminal
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/cntuang/image-visionOr
name: vision-analyze description: Image analysis using multimodal vision models. Use when user needs to: (1) Describe what's in an image, (2) Extract text from images (OCR), (3) Analyze visual content, (4) Compare images, (5) Answer questions about images. Supports JPG, PNG, GIF, WebP formats. metadata: { "openclaw": { "emoji": "👁️", "requires": {}, }, }
Vision Analyze
Analyze images using the built-in vision capabilities of multimodal AI models.
Quick Start
Analyze an Image
Describe what's in an image:
# The agent will automatically use vision when you provide an image path
image("/path/to/image.jpg", prompt="Describe what's in this image")
Extract Text (OCR)
Extract text from images:
image("/path/to/document.png", prompt="Extract all text from this image")
Analyze Multiple Images
Compare or analyze multiple images:
images(["/path/to/image1.jpg", "/path/to/image2.jpg"],
prompt="Compare these two images and describe the differences")
Usage Patterns
Visual Q&A
Ask specific questions about image content:
image("menu.jpg", prompt="What are the prices of the main courses?")
image("chart.png", prompt="What trend does this graph show?")
image("screenshot.png", prompt="What error message is displayed?")
Content Moderation
Check image content:
image("upload.jpg", prompt="Is this image appropriate for a professional setting?")
Data Extraction
Extract structured data from visual content:
image("receipt.jpg", prompt="Extract the date, total amount, and items purchased")
image("business_card.png", prompt="Extract name, phone, email, and company")
image("form.jpg", prompt="Extract all filled fields as key-value pairs")
Visual Comparison
Compare images:
images(["before.jpg", "after.jpg"],
prompt="What changes were made between these two images?")
Tips
- Be specific: The more specific your prompt, the better the results
- Multiple images: You can analyze up to 20 images at once
- Supported formats: JPG, PNG, GIF, WebP
- Size limits: Large images are automatically resized
When to Use
- Reading text from screenshots, documents, or photos
- Describing visual content for accessibility
- Analyzing charts, graphs, or diagrams
- Comparing visual changes
- Extracting data from forms or receipts
- Understanding UI elements or error messages
Metadata
AI Skill Finder
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skill Add to Configuration
Paste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-cntuang-image-vision": {
"enabled": true,
"auto_update": true
}
}
}Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.