Official Verified media Safety 4/5

llm-video-generator

Generate videos from text descriptions using ZhipuAI CogVideoX-3 model. Supports text-to-video, image-to-video, and first/last frame-to-video generation. Automatically handles long videos (over 5s) by chaining multiple generation calls with last-frame continuation. Use when the user asks to create/generate a video from text, make a video, text-to-video, 文生视频, 生成视频, 做个视频, or any request involving converting text/images into a video. Supports configuring video content, style, resolution (up to 4K), frame rate (30/60fps), audio, and duration.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/baokui/llm-video-generator

Download Source Code (.zip)

What This Skill Does

The llm-video-generator is a powerful OpenClaw AI agent skill designed to bridge the gap between text/image concepts and professional-grade video content. Utilizing the ZhipuAI CogVideoX-3 model, it acts as a creative director and engine, supporting text-to-video, image-to-video, and sequence-based generation. The skill intelligently manages long-form content by breaking down requests into 5-second segments, utilizing frame-continuation techniques to ensure visual stability, and concatenating the final output into a seamless video file.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/baokui/llm-video-generator Ensure you have the necessary environment permissions, as the skill utilizes /opt/anaconda3/bin/python3 for its internal scripts, including video concatenation and frame extraction tools.

Use Cases

Marketing & Social Media: Quickly generate short promotional clips or B-roll for social media content from simple text scripts.
Concept Visualization: Turn static images or storyboards into animated sequences to pitch design or film ideas.
Education & Training: Create visual aids for complex topics where a static image is insufficient to show the progression of a process.
Artistic Exploration: Generate surreal or high-fidelity cinematic clips based on artistic prompts or style descriptions.

Example Prompts

"Make a 15-second cinematic video of a futuristic cyberpunk city at night with rain falling, 1080p, 30fps."
"Generate a video showing a flower blooming in a desert, use this image [path/to/image.jpg] as the starting frame."
"做个视频：一只可爱的小猫在草地上追逐蝴蝶，风格要温馨，时长10秒。"

Tips & Limitations

Patience is Key: High-definition video generation is resource-intensive. Always review the estimated time provided by the agent before starting.
Consistency: When generating segments for long videos, ensure your prompts for subsequent segments explicitly describe the state of the characters/objects from the end of the previous segment to maintain continuity.
Resolution: While 4K is supported, it significantly increases processing time. Use 1080p for draft versions to iterate faster.
Limitations: The model generates 5-second chunks; avoid requesting single-shot videos longer than 30 seconds to maintain optimal coherence.

Read Full Documentation on GitHub

Metadata

Author@baokui

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-baokui-llm-video-generator": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#video#generative-ai#creative#multimedia

Safety Score: 4/5

Flags: external-api, file-read, file-write, code-execution

Related Skills

pdf-process-mineru

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

baokui 4473

Pdf Ocr Layout

Skill by baokui

baokui 4473

wan-t2i

阿里云DashScope Wan2.6文生图工具。使用阿里云百炼平台的Wan2.6-t2i模型生成图片。当用户需要：AI生成图片、文生图、从文字生成图像时触发。需要DASHSCOPE_API_KEY环境变量（已在系统中配置）。

baokui 4473

glm-v-model

智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。当用户提到：图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。

baokui 4473