Ai Media
Skill by bowen31337
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bowen31337/ai-mediaWhat This Skill Does
The ai-media skill provides a high-performance interface for full-stack AI media generation, leveraging remote GPU hardware (RTX 3090/3080/2070S) to offload heavy computational tasks. Designed for integration within the OpenClaw ecosystem, it manages image creation through ComfyUI, video synthesis via AnimateDiff and LTX-2, talking head animation via SadTalker, and natural voice synthesis using Voxtral. This skill acts as a bridge between the OpenClaw agent and a dedicated backend server, ensuring that complex generation tasks—which would normally overwhelm local system resources—are handled efficiently on specialized hardware.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/bowen31337/ai-media
Ensure your SSH configuration is updated with the required credentials as specified in the configuration documentation. The skill relies on pre-configured directories on the GPU server, specifically mapping to /data/ai-stack/ for all ComfyUI, SadTalker, and Voxtral operations. Ensure that the ~/.ssh/id_ed25519_gpu key is properly placed to facilitate secure communication between the agent and the server.
Use Cases
- Content Creation: Rapidly generate photorealistic assets for marketing, social media posts, or game design using the Juggernaut XL model.
- Video Prototyping: Create short-form video content or animated GIFs using AnimateDiff motion modules.
- Virtual Avatars: Generate talking head videos for automated customer support or educational tutorials by pairing custom audio with specific avatar images.
- Voiceover Production: Convert long-form text into natural-sounding speech for accessibility tools or multimedia presentations.
Example Prompts
- "Generate a photorealistic portrait of an astronaut walking on Mars using the realistic style setting."
- "Create an 8-second video of a cyberpunk city street at night using the LTX-2 model."
- "Synthesize an audio file with a female voice saying 'Welcome to the open source future' in English."
Tips & Limitations
- Hardware Constraints: Generation times depend on the specific GPU assigned to your host. High-resolution images or long-duration videos may increase latency.
- Model Compatibility: Always ensure the input image dimensions for SadTalker match the recommended aspect ratios to prevent artifacts.
- Output Management: Files are saved to
/data/ai-stack/output/. Remember to periodically clear this directory to prevent storage overflow on your GPU server. - Language Support: Voice synthesis performance varies by language; always verify the language code against the supported list in the Voxtral documentation.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bowen31337-ai-media": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, code-execution
Related Skills
Terse
Skill by bowen31337
Identity Resolver
Skill by bowen31337
whalecli
Agent-native whale wallet tracker for ETH and BTC chains. Track large crypto wallet movements, score whale activity, detect accumulation/distribution patterns, and stream real-time alerts. Integrates with FearHarvester and Simmer prediction markets for closed-loop signal→bet workflows. Use when: user asks about whale activity, on-chain signals, large wallet movements, smart money flows, or when pre-validating crypto trades/bets with on-chain data.
agent-self-governance
Self-governance protocol for autonomous agents: WAL (Write-Ahead Log), VBR (Verify Before Reporting), ADL (Anti-Divergence Limit), VFM (Value-For-Money), and IKL (Infrastructure Knowledge Logging). Use when: (1) receiving a user correction — log it before responding, (2) making an important decision or analysis — log it before continuing, (3) pre-compaction memory flush — flush the working buffer to WAL, (4) session start — replay unapplied WAL entries to restore lost context, (5) any time you want to ensure something survives compaction, (6) before claiming a task is done — verify it, (7) periodic self-check — am I drifting from my persona? (8) cost tracking — was that expensive operation worth it? (9) discovering infrastructure — log hardware/service specs immediately.
pyright-lsp
Python language server (Pyright) providing static type checking, code intelligence, and LSP diagnostics for .py and .pyi files. Use when working with Python code that needs type checking, autocomplete suggestions, error detection, or code navigation.