What This Skill Does

The ms-qwen-vl skill integrates the powerful Qwen3-VL multimodal model from ModelScope directly into your OpenClaw workflow. It empowers your agent with advanced visual perception, allowing it to interpret images, extract text via OCR, detect specific objects, and perform complex visual reasoning. By leveraging an OpenAI-compatible SDK, this skill ensures consistent and high-performance communication with ModelScope's inference endpoints, providing both a standard speed-optimized mode and a high-precision 235B model for granular tasks.

Installation

To enable this skill, run the following command in your terminal: clawhub install openclaw/skills/skills/crocketc/ms-qwen-vl

Once installed, you must configure your environment variables. Copy the .env.example file to .env and provide your API key obtained from the ModelScope console. Set the MODELSCOPE_API_KEY variable to ensure the agent has authorization to access the visual inference services.

Use Cases

This skill is ideal for tasks requiring visual understanding, such as:

Automated Data Entry: Using OCR to transcribe handwritten notes or scanned invoices into digital formats.
Content Moderation & Analysis: Describing complex screenshots or analyzing visual data for reports.
Visual Q&A: Asking questions about specific elements within a dashboard or a complex diagram.
Asset Management: Detecting objects in images to categorize or tag assets effectively.

Example Prompts

"Can you perform an OCR scan on this invoice image located at D:\Documents\invoice.jpg and extract the total amount?"
"Describe the contents of this screenshot: C:\Users\Desktop\ui_design.png, and point out any alignment issues."
"Look at this chart image https://example.com/data.png and explain the trend shown in the visual representation."

Tips & Limitations

Input Handling: Always ensure the local file paths provided to the agent are correct; the underlying script automatically manages base64 conversion for optimal API transmission.
Performance: Use the standard mode for general tasks to minimize latency. If you require deep logical reasoning or high-accuracy analysis, append the --precise flag to trigger the 235B model.
Security: Be mindful when uploading images containing sensitive information to external APIs; ensure your ModelScope privacy settings align with your data security requirements.

ms-qwen-vl

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)