vlmrun-cli-skill
Use the VLM Run CLI (`vlmrun`) to interact with Orion visual AI agent. Process images, videos, and documents with natural language. Triggers: image understanding/generation, object detection, OCR, video summarization, document extraction, image generation, visual AI chat, 'generate an image/video', 'analyze this image/video', 'extract text from', 'summarize this video', 'process this PDF'.
Why use this skill?
Use the VLM Run CLI skill to process images, videos, and documents with Orion AI. Enable advanced object detection, OCR, and media generation in OpenClaw.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/spillai/vlmrun-cli-skillWhat This Skill Does
The vlmrun-cli-skill provides a direct interface to the Orion visual AI agent, enabling powerful multimodal processing directly from your terminal. By leveraging the vlmrun binary, this skill allows OpenClaw to analyze, interpret, and generate content from a variety of file types, including images, videos, and PDFs. It serves as a bridge for complex visual reasoning tasks, such as object detection, document data extraction, video summarization, and generative media creation, all through simple natural language commands.
Installation
To enable this skill within your OpenClaw environment, execute the following command:
clawhub install openclaw/skills/skills/spillai/vlmrun-cli-skill
Prerequisites include having the vlmrun CLI installed via uv or pip. Ensure that you have set your VLMRUN_API_KEY in your environment variables to authenticate with the Orion service. You can optionally configure VLMRUN_BASE_URL and VLMRUN_CACHE_DIR to tailor the behavior to your specific infrastructure.
Use Cases
This skill is highly effective for developers, data analysts, and content creators. Typical use cases include:
- Automated Data Entry: Extracting structured information from invoices, receipts, or legal contracts.
- Media Analysis: Generating descriptive summaries for long-form video content or identifying objects within surveillance/test footage.
- Creative Workflows: Rapidly iterating on image and video generation prompts directly within a terminal-based workflow.
- Quality Assurance: Comparing "before and after" image states to detect visual regressions in UI testing.
Example Prompts
- "Analyze the attached invoice.pdf and return the total amount and merchant name in JSON format."
- "Look at these two images: photo1.jpg and photo2.jpg. Describe the key visual differences between them."
- "Summarize the key discussion points from this meeting video and create a list of action items."
Tips & Limitations
- Performance: For massive video files, processing time may vary; consider using the --no-stream flag if dealing with unstable network connections.
- Caching: The default cache directory is
~/.vlmrun/cache/artifacts/. Regularly prune this directory to prevent disk bloat during heavy usage. - Model Selection: Experiment with
vlmrun-orion-1:fastfor quick lookups andvlmrun-orion-1:profor high-accuracy extraction tasks.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-spillai-vlmrun-cli-skill": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api