ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

talking-head-production

Talking head video production with AI avatars, lipsync, and voiceover. Covers portrait requirements, audio quality, OmniHuman, PixVerse lipsync, Dia TTS. Use for: spokesperson videos, course content, social media, presentations, demos. Triggers: talking head, avatar video, lipsync, lip sync, ai spokesperson, virtual presenter, ai presenter, omnihuman, talking avatar, video presenter, ai talking head, presenter video, ai face video

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/okaris/talking-head-production
Or

Talking Head Production

Create talking head videos with AI avatars and lipsync via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate dialogue audio
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to our product tour. Today I will show you three features that will save you hours every week."
}'

# Create talking head video with OmniHuman
infsh app run bytedance/omnihuman-1-5 --input '{
  "image": "path/to/portrait.png",
  "audio": "path/to/dialogue.mp3"
}'

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Portrait Requirements

The source portrait image is critical. Poor portraits = poor video output.

Must Have

RequirementWhySpec
Center-framedAvatar needs face in predictable positionFace centered in frame
Head and shouldersBody visible for natural gesturesCrop below chest
Eyes to cameraCreates connection with viewerDirect frontal gaze
Neutral expressionStarting point for animationSlight smile OK, not laughing/frowning
Clear faceModel needs to detect featuresNo sunglasses, heavy shadows, or obstructions
High resolutionDetail preservationMin 512x512 face region, ideally 1024x1024+

Background

TypeWhen to Use
Solid colorProfessional, clean, easy to composite
Soft bokehNatural, lifestyle feel
Office/studioBusiness context
Transparent (via bg removal)Compositing into other scenes
# Generate a professional portrait background
infsh app run falai/flux-dev-lora --input '{
  "prompt": "professional headshot photograph of a friendly business person, soft studio lighting, clean grey background, head and shoulders, direct eye contact, neutral pleasant expression, high quality portrait photography"
}'

# Or remove background from existing portrait
infsh app run <bg-removal-app> --input '{
  "image": "path/to/portrait-with-background.png"
}'

Audio Quality

Audio quality directly impacts lipsync accuracy. Clean audio = accurate lip movement.

Requirements

ParameterTargetWhy
Background noiseNone/minimalNoise confuses lipsync timing
VolumeConsistent throughoutPrevents sync drift
Sample rate44.1kHz or 48kHzStandard quality
FormatMP3 128kbps+ or WAVCompatible with all tools

Generating Audio

Metadata

Author@okaris
Stars1287
Views0
Updated2026-02-22
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-okaris-talking-head-production": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

content-repurposing

Content atomization — turn one piece of content into many formats. Covers blog-to-thread, blog-to-carousel, podcast-to-blog, video-to-quotes, and more. Use for: content marketing, social media, multi-platform distribution, content strategy. Triggers: content repurposing, repurpose content, content atomization, content recycling, one to many content, multi platform content, cross post, adapt content, reformat content, blog to thread, blog to video, podcast to blog, content multiplication

okaris 1287

product-changelog

Product changelog and release notes that users actually read. Covers categorization, user-facing language, visuals, and distribution. Use for: release notes, changelogs, product updates, feature announcements, versioning. Triggers: changelog, release notes, product update, version notes, what's new, feature announcement, product changelog, update log, release announcement, version release, product release, ship notes

okaris 1287

logo-design-guide

Logo design principles and AI image generation best practices for creating logos. Covers logo types, prompting techniques, scalability rules, and iteration workflows. Use for: brand identity, startup logos, app icons, favicons, logo concepts. Triggers: logo design, create logo, brand logo, logo generation, ai logo, logo maker, icon design, brand mark, logo concept, startup logo, app icon logo

okaris 1287

product-photography

AI product photography with studio lighting, lifestyle shots, and packshot conventions. Covers angles, backgrounds, shadow types, hero shots, and e-commerce image requirements. Use for: product photos, e-commerce images, Amazon listings, packshots, lifestyle photography. Triggers: product photography, product photo, packshot, e-commerce photography, product shot, product image, studio photography, lifestyle product, amazon product photo, product listing image, hero shot, product mockup, commercial photography

okaris 1287

newsletter-curation

Newsletter curation with content sourcing, editorial structure, and subscriber growth strategies. Covers issue formatting, link roundups, commentary style, and sending cadence. Use for: email newsletters, link roundups, weekly digests, curated content, creator newsletters. Triggers: newsletter, email newsletter, newsletter curation, weekly digest, link roundup, curated newsletter, newsletter writing, newsletter format, subscriber growth, newsletter strategy, content curation, newsletter template

okaris 1287