llm-video-generator
Generate videos from text descriptions using ZhipuAI CogVideoX-3 model. Supports text-to-video, image-to-video, and first/last frame-to-video generation. Automatically handles long videos (over 5s) by chaining multiple generation calls with last-frame continuation. Use when the user asks to create/generate a video from text, make a video, text-to-video, 文生视频, 生成视频, 做个视频, or any request involving converting text/images into a video. Supports configuring video content, style, resolution (up to 4K), frame rate (30/60fps), audio, and duration.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/baokui/llm-video-generatorWhat This Skill Does
The llm-video-generator is a powerful OpenClaw AI agent skill designed to bridge the gap between text/image concepts and professional-grade video content. Utilizing the ZhipuAI CogVideoX-3 model, it acts as a creative director and engine, supporting text-to-video, image-to-video, and sequence-based generation. The skill intelligently manages long-form content by breaking down requests into 5-second segments, utilizing frame-continuation techniques to ensure visual stability, and concatenating the final output into a seamless video file.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/baokui/llm-video-generator
Ensure you have the necessary environment permissions, as the skill utilizes /opt/anaconda3/bin/python3 for its internal scripts, including video concatenation and frame extraction tools.
Use Cases
- Marketing & Social Media: Quickly generate short promotional clips or B-roll for social media content from simple text scripts.
- Concept Visualization: Turn static images or storyboards into animated sequences to pitch design or film ideas.
- Education & Training: Create visual aids for complex topics where a static image is insufficient to show the progression of a process.
- Artistic Exploration: Generate surreal or high-fidelity cinematic clips based on artistic prompts or style descriptions.
Example Prompts
- "Make a 15-second cinematic video of a futuristic cyberpunk city at night with rain falling, 1080p, 30fps."
- "Generate a video showing a flower blooming in a desert, use this image [path/to/image.jpg] as the starting frame."
- "做个视频:一只可爱的小猫在草地上追逐蝴蝶,风格要温馨,时长10秒。"
Tips & Limitations
- Patience is Key: High-definition video generation is resource-intensive. Always review the estimated time provided by the agent before starting.
- Consistency: When generating segments for long videos, ensure your prompts for subsequent segments explicitly describe the state of the characters/objects from the end of the previous segment to maintain continuity.
- Resolution: While 4K is supported, it significantly increases processing time. Use 1080p for draft versions to iterate faster.
- Limitations: The model generates 5-second chunks; avoid requesting single-shot videos longer than 30 seconds to maintain optimal coherence.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-baokui-llm-video-generator": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: external-api, file-read, file-write, code-execution
Related Skills
pdf-process-mineru
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
Pdf Ocr Layout
Skill by baokui
wan-t2i
阿里云DashScope Wan2.6文生图工具。使用阿里云百炼平台的Wan2.6-t2i模型生成图片。 当用户需要:AI生成图片、文生图、从文字生成图像时触发。 需要DASHSCOPE_API_KEY环境变量(已在系统中配置)。
glm-v-model
智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。