What This Skill Does

The human-avatar skill acts as a powerful bridge to Alibaba Cloud's advanced visual intelligence services, allowing the OpenClaw agent to generate professional talking head videos. By integrating the DashScope API and LingMou (灵眸) platform, this skill empowers the agent to process images and audio to create highly realistic digital human performances. It supports three distinct animation modes: EMO (Portrait-based speech synthesis), AA (Animate Anyone for full-body motion), and LingMou (template-based enterprise digital human production). Whether you are creating social media content, customer service video tutorials, or professional presentations, this skill provides a robust pipeline for video generation.

Installation

To enable this skill, use the following command in your terminal: clawhub install openclaw/skills/skills/davideuler/human-avatar

Ensure you have the following dependencies installed in your environment: pip install requests dashscope oss2 alibabacloud-lingmou20250527 alibabacloud-tea-openapi

You must also configure your environment variables with valid credentials for the 'cn-beijing' region, including your DASHSCOPE_API_KEY and the specific Aliyun ACCESS_KEY_ID and SECRET for LingMou services.

Use Cases

Content Creation: Generate automated video commentary by uploading a portrait photo and a voice clip.
Marketing Automation: Create personalized video greetings or updates for customers using LingMou templates.
Animation & Entertainment: Use Animate Anyone (AA) to drive full-body character animations for games or storytelling.
Video Localization: Utilize VideoRetalk to replace the actors in existing clips while maintaining realistic lip-syncing.

Example Prompts

"Please use the portrait photo at [URL] and the audio clip [URL] to generate an EMO talking head video."
"Create a professional greeting video using the LingMou template 'BS1b2WNnRMu4ouRzT4clY9Jhg' with the text: 'Welcome to our monthly product update.'"
"Can you perform a full-body animation on the character image [URL] using the motion data from this reference video [URL]?"

Tips & Limitations

Region Constraint: All operations must be performed in the cn-beijing region; cross-region credentials will fail.
Resource Limits: EMO audio clips must be under 60 seconds and 15MB. Ensure input images feature a clear, unobstructed, frontal face.
Data Privacy: Always ensure input files are hosted on accessible public HTTPS URLs, as the API cannot directly read file:// paths.
Quality Control: Pre-process your audio files to remove background noise to significantly improve lip-sync accuracy.

human-avatar

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)