human-avatar
使用阿里云 DashScope/灵眸 API 生成人脸口播视频(talking head video)。支持三种模式:EMO(人像+音频驱动口播,两步流程)、AA/Animate Anyone(全身动画)、灵眸(基于模板的数字人口播视频)。当用户需要制作口播视频、数字人视频、EMO/AA 人脸动画、VideoRetalk 视频换人时触发此技能。
Why use this skill?
Use the OpenClaw human-avatar skill to generate professional talking head videos, EMO animations, and digital humans via Alibaba Cloud's powerful video AI.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/davideuler/human-avatarWhat This Skill Does
The human-avatar skill acts as a powerful bridge to Alibaba Cloud's advanced visual intelligence services, allowing the OpenClaw agent to generate professional talking head videos. By integrating the DashScope API and LingMou (灵眸) platform, this skill empowers the agent to process images and audio to create highly realistic digital human performances. It supports three distinct animation modes: EMO (Portrait-based speech synthesis), AA (Animate Anyone for full-body motion), and LingMou (template-based enterprise digital human production). Whether you are creating social media content, customer service video tutorials, or professional presentations, this skill provides a robust pipeline for video generation.
Installation
To enable this skill, use the following command in your terminal:
clawhub install openclaw/skills/skills/davideuler/human-avatar
Ensure you have the following dependencies installed in your environment:
pip install requests dashscope oss2 alibabacloud-lingmou20250527 alibabacloud-tea-openapi
You must also configure your environment variables with valid credentials for the 'cn-beijing' region, including your DASHSCOPE_API_KEY and the specific Aliyun ACCESS_KEY_ID and SECRET for LingMou services.
Use Cases
- Content Creation: Generate automated video commentary by uploading a portrait photo and a voice clip.
- Marketing Automation: Create personalized video greetings or updates for customers using LingMou templates.
- Animation & Entertainment: Use Animate Anyone (AA) to drive full-body character animations for games or storytelling.
- Video Localization: Utilize VideoRetalk to replace the actors in existing clips while maintaining realistic lip-syncing.
Example Prompts
- "Please use the portrait photo at [URL] and the audio clip [URL] to generate an EMO talking head video."
- "Create a professional greeting video using the LingMou template 'BS1b2WNnRMu4ouRzT4clY9Jhg' with the text: 'Welcome to our monthly product update.'"
- "Can you perform a full-body animation on the character image [URL] using the motion data from this reference video [URL]?"
Tips & Limitations
- Region Constraint: All operations must be performed in the
cn-beijingregion; cross-region credentials will fail. - Resource Limits: EMO audio clips must be under 60 seconds and 15MB. Ensure input images feature a clear, unobstructed, frontal face.
- Data Privacy: Always ensure input files are hosted on accessible public HTTPS URLs, as the API cannot directly read file:// paths.
- Quality Control: Pre-process your audio files to remove background noise to significantly improve lip-sync accuracy.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-davideuler-human-avatar": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api