Official Verified ai models Safety 4/5

volc-vision

使用火山引擎 ARK API 做图片理解、图片描述、视觉问答与图像分析。适用于用户发来图片并询问“这是什么”“图里有什么”“帮我看下这张图”“描述一下图片内容”“识别图片中的信息”等场景，也适用于需要对本地图片、图片 URL 或 base64 图片做理解和问答时。

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/big-dust/volc-vision

Download Source Code (.zip)

What This Skill Does

The volc-vision skill is a powerful computer vision integration designed to bridge the gap between image data and natural language understanding. By leveraging the advanced Volcano Engine (ARK) API, this skill allows your OpenClaw AI agent to 'see' and interpret images, photos, and visual data. It supports diverse input formats, including local file paths, publicly accessible image URLs, and base64-encoded strings, making it highly versatile for various automation workflows. The skill utilizes a prioritized model system, with the 'doubao-seed-1-6-vision-250815' model as the default flagship, ensuring high-quality, accurate analysis of complex images.

Installation

To integrate this skill into your environment, use the OpenClaw package manager: clawhub install openclaw/skills/skills/big-dust/volc-vision

Ensure you have your Volcano Engine ARK API key ready. Set it as an environment variable in your terminal session or your OpenClaw configuration file: export ARK_API_KEY="your_api_key_here"

Use Cases

Information Extraction: Automatically identify text or key data points within receipts, documents, or screenshots.
Content Moderation: Analyze incoming images to describe their context, objects, or themes.
Visual Q&A: Provide interactive feedback to users who upload photos and ask specific questions about the contents.
Accessibility: Generate descriptive alt-text or summaries for images shared in communication channels.
Automated Workflow Routing: Use visual input to trigger downstream business logic based on what is detected in the image.

Example Prompts

"Look at this screenshot and explain what error message is being displayed."
"I'm sending an image of my desk. What items can you identify in this photo?"
"Describe the overall design and color scheme of this interior design image."

Tips & Limitations

Model Selection: While the skill automatically selects the best model, you can override this by setting the VISION_MODEL environment variable to any of the supported models (e.g., doubao-1-5-vision-pro-32k-250115).
Performance: High-resolution images may consume more processing time; ensure inputs are optimized for the best experience.
Scope: This skill is strictly for image understanding. It does not possess image-generation capabilities; use dedicated imaging skills if your intent is to create visuals rather than analyze them.
Privacy: Since this skill transmits images to the ARK API for processing, ensure that sensitive or confidential images are handled in accordance with your organization's data privacy policies.

Read Full Documentation on GitHub

Metadata

Author@big-dust

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-big-dust-volc-vision": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#vision#ocr#image-analysis#volcano-engine#multimodal

Safety Score: 4/5

Flags: file-read, external-api

Related Skills

img-upload

将本地图片上传到 img.scdn.io 免费图床并返回公开链接。适用于用户需要把图片变成可分享 URL、上传生成结果、上传截图、上传本地图片供外链引用，或明确要求免费图床、图床、图片外链、分享链接时。若任务中已经有本地图片文件，且下一步需要分享、引用、粘贴到文档、消息或网页中，应优先考虑此技能。

big-dust 4473

feishu-bot-creator

创建飞书企业自建机器人，并完成权限导入、事件订阅、卡片回调和版本发布全流程。适用于创建飞书机器人、飞书应用机器人，或自动化完成飞书开放平台建机器人流程的场景。

big-dust 4473

artifact-organizer

按任务阶段而不是按文件类型整理混合产物。当编码、写作、脚本处理、研究、自动化或多步骤任务产生多个文件，且文件开始散落、临时产物与最终产物混在一起、目录结构不清晰，或用户提出“整理目录”“整理工作区”“归类”“归档”“收整产物”“文件有点乱”“帮我理一理结构”时，使用此技能。既支持任务开始前先规划 staged 目录结构，也支持对已经变乱的任务目录做清理和重组；遇到密码、token、API key、服务器凭据等精确敏感信息时，优先归入现有 secrets/ 体系。

big-dust 4473