ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

omnihuman-video

使用 OmniHuman v1.5 生成音频驱动的口型同步视频。当用户想要让图片中的人物说话、配音、口型同步,或提到 omnihuman 时使用此 skill。

Why use this skill?

Generate professional lip-synced talking videos from static portraits using OmniHuman v1.5 with OpenClaw. High-quality AI video animation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/hexiaochun/omnihuman-video
Or

What This Skill Does

The omnihuman-video skill is a powerful integration that enables OpenClaw to leverage Bytedance's OmniHuman v1.5 model. It is designed to transform static portraits into high-quality, lifelike talking videos driven by audio. By processing a combination of an image and an audio file, this skill performs advanced facial animation, lip-syncing, and expression rendering. Whether you need to generate professional presentation videos, social media content, or personalized avatars, this skill manages the entire pipeline—from task submission and status tracking to the final rendering of video content.

Installation

To install this skill, use the following command in your terminal or OpenClaw management console: clawhub install openclaw/skills/skills/hexiaochun/omnihuman-video Ensure that you have the necessary API credentials configured in your environment, as this skill interacts with the fal.ai infrastructure for processing.

Use Cases

  • Virtual Presentations: Convert a professional headshot into a video presentation by uploading an audio script.
  • Content Creation: Bring characters to life for social media or marketing campaigns without expensive video production equipment.
  • Educational Content: Create engaging, consistent tutor avatars for remote learning materials.
  • Personalized Messaging: Send personalized birthday or greeting messages where the subject appears to speak the custom audio.

Example Prompts

  1. "Use this image [link] and this audio file [link] to generate a professional 1080p talking head video for my team presentation."
  2. "I need a video of the person in this image saying 'Welcome to our platform' using OmniHuman; please use 720p for faster results."
  3. "Create an AI-driven video using the provided portrait and the TTS audio clip I just generated. Make sure the lips are perfectly synced."

Tips & Limitations

  • Image Quality: Always use clear, high-resolution front-facing or semi-profile portraits. Avoid blurry images or pictures where the face is obstructed.
  • Audio Clarity: For the best results, use studio-quality voice recordings. Heavy background music or excessive environmental noise may degrade the lip-sync quality.
  • Resolution vs. Time: Remember that 1080p is limited to 30 seconds of audio, while 720p supports up to 60 seconds. Choose the resolution based on your video length requirements.
  • Performance: The 'turbo_mode' can be toggled for faster generation, but if visual fidelity is your priority, keep it set to false to ensure maximum model quality.

Metadata

Stars2387
Views2
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-hexiaochun-omnihuman-video": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#image-to-video#lipsync#audio-driven#omnihuman
Safety Score: 4/5

Flags: external-api