Official Verified

talking-head-production

Talking head video production with AI avatars, lipsync, and voiceover. Covers portrait requirements, audio quality, OmniHuman, PixVerse lipsync, Dia TTS. Use for: spokesperson videos, course content, social media, presentations, demos. Triggers: talking head, avatar video, lipsync, lip sync, ai spokesperson, virtual presenter, ai presenter, omnihuman, talking avatar, video presenter, ai talking head, presenter video, ai face video

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/okaris/talking-head-production

Download Source Code (.zip)

Talking Head Production

Create talking head videos with AI avatars and lipsync via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate dialogue audio
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to our product tour. Today I will show you three features that will save you hours every week."
}'

# Create talking head video with OmniHuman
infsh app run bytedance/omnihuman-1-5 --input '{
  "image": "path/to/portrait.png",
  "audio": "path/to/dialogue.mp3"
}'

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Portrait Requirements

The source portrait image is critical. Poor portraits = poor video output.

Must Have

Requirement	Why	Spec
Center-framed	Avatar needs face in predictable position	Face centered in frame
Head and shoulders	Body visible for natural gestures	Crop below chest
Eyes to camera	Creates connection with viewer	Direct frontal gaze
Neutral expression	Starting point for animation	Slight smile OK, not laughing/frowning
Clear face	Model needs to detect features	No sunglasses, heavy shadows, or obstructions
High resolution	Detail preservation	Min 512x512 face region, ideally 1024x1024+

Background

Type	When to Use
Solid color	Professional, clean, easy to composite
Soft bokeh	Natural, lifestyle feel
Office/studio	Business context
Transparent (via bg removal)	Compositing into other scenes

# Generate a professional portrait background
infsh app run falai/flux-dev-lora --input '{
  "prompt": "professional headshot photograph of a friendly business person, soft studio lighting, clean grey background, head and shoulders, direct eye contact, neutral pleasant expression, high quality portrait photography"
}'

# Or remove background from existing portrait
infsh app run <bg-removal-app> --input '{
  "image": "path/to/portrait-with-background.png"
}'

Audio Quality

Audio quality directly impacts lipsync accuracy. Clean audio = accurate lip movement.

Requirements

Parameter	Target	Why
Background noise	None/minimal	Noise confuses lipsync timing
Volume	Consistent throughout	Prevents sync drift
Sample rate	44.1kHz or 48kHz	Standard quality
Format	MP3 128kbps+ or WAV	Compatible with all tools

Generating Audio

Read Full Documentation on GitHub

Metadata

Author@okaris

Stars1287

Updated2026-02-22

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-okaris-talking-head-production": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

content-repurposing

Content atomization — turn one piece of content into many formats. Covers blog-to-thread, blog-to-carousel, podcast-to-blog, video-to-quotes, and more. Use for: content marketing, social media, multi-platform distribution, content strategy. Triggers: content repurposing, repurpose content, content atomization, content recycling, one to many content, multi platform content, cross post, adapt content, reformat content, blog to thread, blog to video, podcast to blog, content multiplication

okaris 1287

product-changelog

Product changelog and release notes that users actually read. Covers categorization, user-facing language, visuals, and distribution. Use for: release notes, changelogs, product updates, feature announcements, versioning. Triggers: changelog, release notes, product update, version notes, what's new, feature announcement, product changelog, update log, release announcement, version release, product release, ship notes

okaris 1287

logo-design-guide

Logo design principles and AI image generation best practices for creating logos. Covers logo types, prompting techniques, scalability rules, and iteration workflows. Use for: brand identity, startup logos, app icons, favicons, logo concepts. Triggers: logo design, create logo, brand logo, logo generation, ai logo, logo maker, icon design, brand mark, logo concept, startup logo, app icon logo

okaris 1287

product-photography

AI product photography with studio lighting, lifestyle shots, and packshot conventions. Covers angles, backgrounds, shadow types, hero shots, and e-commerce image requirements. Use for: product photos, e-commerce images, Amazon listings, packshots, lifestyle photography. Triggers: product photography, product photo, packshot, e-commerce photography, product shot, product image, studio photography, lifestyle product, amazon product photo, product listing image, hero shot, product mockup, commercial photography

okaris 1287

newsletter-curation

Newsletter curation with content sourcing, editorial structure, and subscriber growth strategies. Covers issue formatting, link roundups, commentary style, and sending cadence. Use for: email newsletters, link roundups, weekly digests, curated content, creator newsletters. Triggers: newsletter, email newsletter, newsletter curation, weekly digest, link roundup, curated newsletter, newsletter writing, newsletter format, subscriber growth, newsletter strategy, content curation, newsletter template

okaris 1287