Official Verified media Safety 4/5

ugc-manual

Generate lip-sync video from image + user's own audio recording. ✅ USE WHEN: - User provides their OWN audio file (voice recording) - Want to sync image to specific audio/voice - User recorded the script themselves - Need exact audio timing preserved ❌ DON'T USE WHEN: - User provides text script (not audio) → use veed-ugc - Need AI to generate the voice → use veed-ugc - Don't have audio file yet → use veed-ugc with script INPUT: Image + audio file (user's recording) OUTPUT: MP4 video with lip-sync to provided audio KEY DIFFERENCE: veed-ugc = script → AI voice → video ugc-manual = user audio → video (no voice generation)

Why use this skill?

Use UGC-Manual to create high-quality lip-sync videos by combining static face images with your own voice recordings. Ideal for creators.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/pauldelavallaz/ugc-manual

Download Source Code (.zip)

What This Skill Does

The UGC-Manual skill is a specialized tool for creators and developers looking to animate static images by synchronizing them with custom audio files. Unlike automated text-to-speech workflows, this skill prioritizes user control, allowing you to use your own voice recordings, professional voiceovers, or specific sound clips to drive the lip-syncing process. By utilizing the ComfyDeploy UGC-MANUAL workflow, the skill maps the phonemes and intensity of your provided audio directly to the facial structure detected in your chosen image, resulting in a cohesive MP4 video. This is an essential utility for creating personalized avatars, educational content, or character-driven social media posts where the specific emotional inflection and cadence of a human voice are required.

Installation

To add UGC-Manual to your OpenClaw environment, ensure you have ffmpeg installed on your system, as it is a mandatory dependency for handling audio transcoding and normalization. Once your environment is prepped, run the following command in your terminal: clawhub install openclaw/skills/skills/pauldelavallaz/ugc-manual. This will pull the necessary scripts and dependencies from the openclaw/skills repository. After installation, verify the setup by running the deployment script. The skill automatically handles the conversion of various audio formats to the required WAV PCM 16-bit mono 48kHz format required by the underlying FabricLipsync engine.

Use Cases

UGC-Manual is ideal for scenarios where external control over the audio track is paramount. Common use cases include:

Creating high-quality character avatars using pre-recorded professional voiceovers.
Transforming personal voice messages from platforms like Telegram or WhatsApp into video content for social media.
Synchronizing facial animations to non-speech audio, such as song lyrics or rhythmic sounds, for creative artistic projects.
Rapid prototyping of character animations for games or marketing videos where you have already finalized the script recording and need to iterate on the visual face model.

Example Prompts

"I've recorded a voice memo in my local folder called intro.mp3. Use this audio and my headshot.jpg to generate a lip-sync video."
"Please sync this face image (face_photo.png) with the audio file located at https://example.com/podcast_clip.wav and output the result as sync_output.mp4."
"I have a custom character portrait and a pre-recorded professional voiceover. Can you use ugc-manual to create a video of the character speaking my recording?"

Tips & Limitations

For the best results, always ensure the image provides a clear, high-resolution view of a face, preferably in a frontal or slight 3/4 angle. Obstructed views or side profiles may degrade the accuracy of the lip-sync mapping. Be mindful that processing times typically range from 2 to 5 minutes, depending heavily on the length of the audio track provided. Because this skill does not include text-to-speech generation, ensure your audio is clean and free of background noise, as the AI will attempt to interpret all audible frequencies as part of the speech, which may lead to artifacts in mouth movement if the recording quality is poor. Always use this skill when you have your own audio; if you only have a text script, use the veed-ugc skill instead.

Read Full Documentation on GitHub

Metadata

Author@pauldelavallaz

Stars1217

Updated2026-02-20

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-pauldelavallaz-ugc-manual": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#lipsync#ai-video#audio-processing#avatars#ugc

Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution

Related Skills

morpheus-fashion-design

Generate professional advertising images with AI models holding/wearing products. ✅ USE WHEN: - Need a person/model in the image WITH a product - Creating fashion ads, product campaigns, commercial photography - Want consistent model face across multiple shots - Need professional lighting/camera simulation - Input: product image + model reference (or catalog) ❌ DON'T USE WHEN: - Just editing/modifying an existing image → use nano-banana-pro - Product-only shot without a person → use nano-banana-pro - Already have the hero image, need variations → use multishot-ugc - Need video, not image → use veed-ugc after generating image - URL-based product fetch with brand profile → use ad-ready instead OUTPUT: Single high-quality PNG image (2K-4K resolution)

pauldelavallaz 1217

veed-ugc

Generate UGC-style promotional videos with AI lip-sync. Takes an image (person with product from Morpheus/Ad-Ready) and a script (pure dialogue), creates a video of the person speaking. Uses ElevenLabs for voice synthesis.

pauldelavallaz 1217

sora

Generate videos using OpenAI's Sora API. Use when the user asks to generate, create, or make videos from text prompts or reference images. Supports image-to-video generation with automatic resizing.

pauldelavallaz 1217

ad-ready

Generate advertising images automatically from a product URL + brand profile. ✅ USE WHEN: - User provides a product URL (e-commerce link) - Want automated product scraping + image generation - Have a brand profile to apply (70+ brands available) - Need funnel-stage targeting (awareness/consideration/conversion) - Want AI to auto-select model, scene, lighting based on brand ❌ DON'T USE WHEN: - User provides local product image file → use morpheus-fashion-design - Don't need a person in the image → use nano-banana-pro - Want manual control over model, scene, packs → use morpheus-fashion-design - Already have hero image, need variations → use multishot-ugc - Need video output → use veed-ugc after image generation INPUT: Product URL + brand name (optional) + funnel stage (optional) OUTPUT: PNG advertising image with product + model

pauldelavallaz 1217

sora

Generate videos from text prompts or reference images using OpenAI Sora. ✅ USE WHEN: - Need AI-generated video from text description - Want image-to-video (animate a still image) - Creating cinematic/artistic video content - Need motion/animation without lip-sync ❌ DON'T USE WHEN: - Need lip-sync (person speaking) → use veed-ugc or ugc-manual - Just need image generation → use nano-banana-pro or morpheus - Editing existing videos → use Remotion - Need UGC-style talking head → use veed-ugc INPUT: Text prompt + optional reference image OUTPUT: MP4 video (various resolutions/durations)

pauldelavallaz 1217