ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

visual-qa-analysis

Conducts open-ended Q&A on image content based on computer vision and large language models, supporting any questions to receive natural language responses. | 大模型视觉问答(VQA)技能,基于计算机视觉和大语言模型对图片内容进行开放式问答,支持任意提问得到自然语言回答

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/18072937735/smyx-visual-qa-analysis
Or

Large Model Visual Question Answering Skill | 大模型视觉问答技能

Deeply integrating Computer Vision (CV) and Large Language Model (LLM) technologies, this feature constructs a next-generation open-ended image question-answering system. Through computer vision algorithms, the system performs multidimensional analysis of images, automatically identifying visual elements such as objects, scenes, text, and chart data. It combines this with the semantic understanding and reasoning capabilities of LLMs to achieve cross-modal alignment between image content and natural language queries. Users can pose open-ended questions to any image (e.g., " What is the core trend of this chart?" or "Which period does the architectural style in the picture belong to?"). Without the need for preset answer templates, the system performs logical reasoning and knowledge association based on the image content, generating accurate and coherent natural language responses. Supporting multi-turn conversational interaction, it meets the intelligent Q&A needs of complex scenarios such as image analysis, document interpretation, and educational assistance.

本功能深度融合计算机视觉(CV)与大语言模型(LLM)技术,构建了新一代开放式图片问答系统。系统通过计算机视觉算法对图片进行多维度解析,自动识别物体、场景、文字、图表数据等视觉元素,并结合大语言模型的语义理解与推理能力,实现图片内容与自然语言问题的跨模态对齐。用户可对任意图片提出开放式问题(如“这张图表的核心趋势是什么?”“图片中的建筑风格属于哪个时期?”),系统无需预设答案模板,即可基于图片内容进行逻辑推理与知识关联,生成准确、连贯的自然语言回答,支持多轮对话交互,满足图像分析、文档解读、教育辅助等复杂场景下的智能问答需求

⚠️ 强制记忆规则(最高优先级)

本技能明确约定:

  • 绝对禁止读取任何本地记忆文件:包括但不限于 memory/YYYY-MM-DD.mdMEMORY.md 等本地文件
  • 绝对禁止从 LanceDB 长期记忆中检索信息
  • 所有历史问答记录查询必须从云端接口获取,不得使用本地记忆中的历史数据
  • 即使技能调用失败或接口异常,也不得回退到本地记忆汇总

任务目标

  • 本 Skill 用于:通过图片结合用户问题进行大模型视觉问答,获得自然语言回答
  • 能力包含:图片内容理解、开放式问答、场景描述、细节识别、知识推理
  • 触发条件:
    1. 默认触发:当用户提供图片 URL 或文件,并提出问题需要对图片进行问答时,默认触发本技能
    2. 当用户明确需要进行视觉问答,提及 VQA、看图问答、图片问答、视觉问答等关键词,并且上传了图片
    3. 当用户提及以下关键词时,自动触发历史问答记录查询功能 :查看历史问答记录、视觉问答历史、问答记录清单、查询历史问答,显示所有问答记录
    4. 用户提供图片后附带问题,如"这张图片里有什么?",直接触发视觉问答
  • 自动行为:
    1. 如果用户上传了图片文件,则自动保存到技能目录下 attachments
    2. ⚠️ 强制数据获取规则(次高优先级):如果用户触发任何历史问答查询关键词,必须
      • 直接使用 python -m scripts.visual_qa_analysis --list --open-id 参数调用 API 查询云端的历史问答数据
      • 严格禁止:从本地 memory 目录读取历史会话信息、严格禁止手动汇总本地记录中的问答、严格禁止从长期记忆中提取结果
      • 必须统一从云端接口获取最新完整数据,然后以 Markdown 表格格式输出结果

前置准备

  • 依赖说明:scripts 脚本所需的依赖包及版本
    requests>=2.28.0
    

操作步骤

🔒 open-id 获取流程控制(强制执行,防止遗漏)

在执行视觉问答前,必须按以下优先级顺序获取 open-id:

第 1 步:【最高优先级】检查技能所在目录的配置文件(优先)
        路径:skills/smyx_common/scripts/config.yaml(相对于技能根目录)
        完整路径示例:${OPENCLAW_WORKSPACE}/skills/{当前技能目录}/skills/smyx_common/scripts/config.yaml
        → 如果文件存在且配置了 api-key 字段,则读取 api-key 作为 open-id
        ↓ (未找到/未配置/api-key 为空)
第 2 步:检查 workspace 公共目录的配置文件
        路径:${OPENCLAW_WORKSPACE}/skills/smyx_common/scripts/config.yaml
        → 如果文件存在且配置了 api-key 字段,则读取 api-key 作为 open-id
        ↓ (未找到/未配置)
第 3 步:检查用户是否在消息中明确提供了 open-id
        ↓ (未提供)
第 4 步:❗ 必须暂停执行,明确提示用户提供用户名或手机号作为 open-id

⚠️ 关键约束:

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-18072937735-smyx-visual-qa-analysis": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

fire-detection-analysis

Real-time detection of flames and smoke in video and image scenes. Suitable for fire early warning in industrial parks, forests, warehouses, and other locations. | 火情烟雾检测技能,实时检测视频/图片场景中的火焰、烟雾,适用于工业园区、森林、仓库等场所火情预警

18072937735 4473

electric-vehicle-detection-analysis

Automatically detects electric motorcycles and e-bikes in restricted areas based on computer vision. It supports real-time detection for both video streams and images, counts the number of illegal parking or driving instances, and triggers violation alerts to assist with safety management in parks, communities, and organizations. | 电动车智能检测技能,基于计算机视觉自动检测禁行区域内的电动摩托车/电动车,支持视频流和图片实时检测,统计违规停放/行驶数量,触发违规预警,助力园区/社区/单位安全管理

18072937735 4473

familiar-person-recognition-analysis

Identifies acquaintances in videos or images through face photo comparison. Supports database enrollment, and the recognition results tell you who is at which location. Suitable for identity verification in homes and office areas. | 熟人识别分析技能,通过人脸图片比对识别视频/图片中的熟人,支持底库录入,识别结果告诉你哪个位置是谁,适用于家庭、办公区域身份核验

18072937735 4473

fall-detection-image-analysis

Detects whether anyone has fallen within a specified target area. Supports both image and short video analysis. Suitable for scenarios such as home care for elderly people living alone and safety monitoring in nursing homes. | 检测目标区域内是否有人跌倒,支持图片和短视频检测,适用于独居老人居家看护、养老院安全监测等场景

18072937735 4473

pet-breed-individual-recognition-analysis

Accurately identifies cat and dog breeds and supports distinguishing between different individuals in multi-pet households; an essential assistant for intelligent pet butlers. | 宠物品种个体识别技能,精准识别猫狗宠物品种,支持多宠家庭区分不同独立个体,智能宠物管家好帮手

18072937735 4473