Official Verified media Safety 4/5

wavespeed-infinitetalk

Generate talking head videos from a portrait image and audio using WaveSpeed AI's InfiniteTalk model. Produces lip-synced video up to 10 minutes long at 480p or 720p. Supports optional mask images to target specific faces and text prompts for additional guidance. Use when the user wants to animate a face with audio or create talking avatar videos.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/chengzeyi/wavespeed-infinitetalk-avatar

Download Source Code (.zip)

What This Skill Does

The wavespeed-infinitetalk skill is a high-fidelity AI-powered solution for generating lifelike talking head videos. By combining a static portrait image with an audio file, it performs lip-synchronization and facial animation, resulting in a video where the subject appears to speak the provided audio content. It supports high-quality outputs up to 720p, allows for precise control via face-masking, and provides text-prompt guidance to refine the character's expressions. This makes it an ideal tool for creators, marketers, and developers looking to animate avatars or bring still photographs to life for long-form content.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal:

clawhub install openclaw/skills/skills/chengzeyi/wavespeed-infinitetalk-avatar

Ensure you have configured your WaveSpeed API key by setting the WAVESPEED_API_KEY environment variable. You can obtain your credentials from the official WaveSpeed portal.

Use Cases

Corporate Communications: Generate personalized training videos or announcements using a consistent brand avatar.
Content Creation: Animate characters for storytelling or YouTube intros without requiring expensive motion capture gear.
Education: Create engaging educational content where historical figures or academic models can speak audio explanations.
Virtual Assistants: Deploy custom-branded talking heads for interactive digital experiences.

Example Prompts

"Animate this portrait using the audio file 'morning-announcement.mp3' and make sure the output resolution is set to 720p."
"I have a group photo and a speech clip. Use the provided mask image to animate only the person on the left to ensure the right person stays still."
"Generate a talking head video using 'avatar.png' and 'marketing-pitch.mp3'. Add a prompt to ensure the expression is professional and friendly."

Tips & Limitations

Resolution: While 720p provides higher quality, 480p is recommended for faster generation times during testing.
Masking: When working with images containing multiple people, ensure your mask image covers only the face and immediate skin area. Using a full-image mask will cause the rendering to fail, resulting in a black video.
Prompts: Use concise English prompts to guide the animation style; excessive, complex descriptions can lead to unexpected artifacts.
Audio Length: The model supports inputs up to 10 minutes, but please be mindful that longer clips will increase processing time significantly.

Read Full Documentation on GitHub

Metadata

Author@chengzeyi

Stars3840

Updated2026-04-06

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-chengzeyi-wavespeed-infinitetalk-avatar": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#talking-head#ai-avatar#video-generation#lip-sync#multimedia

Safety Score: 4/5

Flags: file-read, external-api

Related Skills

wavespeed-watermark-remover

Remove watermarks, logos, captions, and text overlays from images and videos using WaveSpeed AI. Intelligently detects and removes watermarks while preserving texture and background. Supports images and videos up to 10 minutes. Use when the user wants to remove watermarks or text overlays from media.

chengzeyi 3840

wavespeed-face-swapper

Swap faces in images and videos using WaveSpeed AI. Supports image face swap and video face swap with multi-face targeting. Produces watermark-free results with automatic lighting and skin tone adaptation. Use when the user wants to replace a face in an image or video with another face.

chengzeyi 3840

wavespeed-minimax-speech-26

Convert text to speech using MiniMax Speech 2.6 Turbo via WaveSpeed AI. Features ultra-human voice cloning, sub-250ms latency, 40+ languages, emotion control, and 200+ voice presets. Use when the user wants to generate speech audio from text.

chengzeyi 3840

wavespeed-seedream-45

Generate and edit images using ByteDance's Seedream V4.5 model via WaveSpeed AI. Supports text-to-image generation and multi-image editing with custom resolutions up to 4096x4096. Features enhanced typography for posters and logos. Use when the user wants to create or edit images with high-quality text rendering.

chengzeyi 3840

wavespeed-nano-banana-2

Generate and edit images using Google's Nano Banana 2 model via WaveSpeed AI. Supports text-to-image generation and image editing with natural language prompts. Features native 4K resolution, flexible aspect ratios including ultra-narrow (1:8, 8:1), multilingual text rendering, and camera-style controls. Use when the user wants to create images from text or edit existing images.

chengzeyi 3840