Official Verified media Safety 3/5

Ai Media

Skill by bowen31337

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bowen31337/ai-media

Download Source Code (.zip)

What This Skill Does

The ai-media skill provides a high-performance interface for full-stack AI media generation, leveraging remote GPU hardware (RTX 3090/3080/2070S) to offload heavy computational tasks. Designed for integration within the OpenClaw ecosystem, it manages image creation through ComfyUI, video synthesis via AnimateDiff and LTX-2, talking head animation via SadTalker, and natural voice synthesis using Voxtral. This skill acts as a bridge between the OpenClaw agent and a dedicated backend server, ensuring that complex generation tasks—which would normally overwhelm local system resources—are handled efficiently on specialized hardware.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/bowen31337/ai-media

Ensure your SSH configuration is updated with the required credentials as specified in the configuration documentation. The skill relies on pre-configured directories on the GPU server, specifically mapping to /data/ai-stack/ for all ComfyUI, SadTalker, and Voxtral operations. Ensure that the ~/.ssh/id_ed25519_gpu key is properly placed to facilitate secure communication between the agent and the server.

Use Cases

Content Creation: Rapidly generate photorealistic assets for marketing, social media posts, or game design using the Juggernaut XL model.
Video Prototyping: Create short-form video content or animated GIFs using AnimateDiff motion modules.
Virtual Avatars: Generate talking head videos for automated customer support or educational tutorials by pairing custom audio with specific avatar images.
Voiceover Production: Convert long-form text into natural-sounding speech for accessibility tools or multimedia presentations.

Example Prompts

"Generate a photorealistic portrait of an astronaut walking on Mars using the realistic style setting."
"Create an 8-second video of a cyberpunk city street at night using the LTX-2 model."
"Synthesize an audio file with a female voice saying 'Welcome to the open source future' in English."

Tips & Limitations

Hardware Constraints: Generation times depend on the specific GPU assigned to your host. High-resolution images or long-duration videos may increase latency.
Model Compatibility: Always ensure the input image dimensions for SadTalker match the recommended aspect ratios to prevent artifacts.
Output Management: Files are saved to /data/ai-stack/output/. Remember to periodically clear this directory to prevent storage overflow on your GPU server.
Language Support: Voice synthesis performance varies by language; always verify the language code against the supported list in the Voxtral documentation.

Read Full Documentation on GitHub

Metadata

Author@bowen31337

Stars4190

Updated2026-04-18

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bowen31337-ai-media": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#media#generation#video#gpu#ai

Safety Score: 3/5

Flags: network-access, file-write, file-read, code-execution

Related Skills

Terse

Skill by bowen31337

bowen31337 4190

Identity Resolver

Skill by bowen31337

bowen31337 4190

whalecli

Agent-native whale wallet tracker for ETH and BTC chains. Track large crypto wallet movements, score whale activity, detect accumulation/distribution patterns, and stream real-time alerts. Integrates with FearHarvester and Simmer prediction markets for closed-loop signal→bet workflows. Use when: user asks about whale activity, on-chain signals, large wallet movements, smart money flows, or when pre-validating crypto trades/bets with on-chain data.

bowen31337 4190

agent-self-governance

Self-governance protocol for autonomous agents: WAL (Write-Ahead Log), VBR (Verify Before Reporting), ADL (Anti-Divergence Limit), VFM (Value-For-Money), and IKL (Infrastructure Knowledge Logging). Use when: (1) receiving a user correction — log it before responding, (2) making an important decision or analysis — log it before continuing, (3) pre-compaction memory flush — flush the working buffer to WAL, (4) session start — replay unapplied WAL entries to restore lost context, (5) any time you want to ensure something survives compaction, (6) before claiming a task is done — verify it, (7) periodic self-check — am I drifting from my persona? (8) cost tracking — was that expensive operation worth it? (9) discovering infrastructure — log hardware/service specs immediately.

bowen31337 4190

pyright-lsp

Python language server (Pyright) providing static type checking, code intelligence, and LSP diagnostics for .py and .pyi files. Use when working with Python code that needs type checking, autocomplete suggestions, error detection, or code navigation.

bowen31337 4190