wavespeed-wan-26
Generate videos using Alibaba's Wan 2.6 model via WaveSpeed AI. Supports text-to-video and image-to-video generation with up to 15 seconds duration at 720p or 1080p. Features audio-guided generation, prompt expansion, multi-shot mode, and configurable seeds. Use when the user wants to create videos from text prompts or animate images.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chengzeyi/wavespeed-wan-26WaveSpeedAI Wan 2.6 Video Generation
Generate videos using Alibaba's Wan 2.6 model via the WaveSpeed AI platform. Supports both text-to-video and image-to-video generation with up to 15 seconds of video at up to 1080p resolution.
Authentication
export WAVESPEED_API_KEY="your-api-key"
Get your API key at wavespeed.ai/accesskey.
Quick Start
Text-to-Video
import wavespeed from 'wavespeed';
const output_url = (await wavespeed.run(
"alibaba/wan-2.6/text-to-video",
{ prompt: "A golden retriever running through a field of sunflowers at sunset" }
))["outputs"][0];
Image-to-Video
The image parameter accepts an image URL. If you have a local file, upload it first with wavespeed.upload() to get a URL.
import wavespeed from 'wavespeed';
// Upload a local image to get a URL
const imageUrl = await wavespeed.upload("/path/to/photo.png");
const output_url = (await wavespeed.run(
"alibaba/wan-2.6/image-to-video",
{
image: imageUrl,
prompt: "The person in the photo slowly turns and smiles"
}
))["outputs"][0];
You can also pass an existing image URL directly:
const output_url = (await wavespeed.run(
"alibaba/wan-2.6/image-to-video",
{
image: "https://example.com/photo.jpg",
prompt: "The person in the photo slowly turns and smiles"
}
))["outputs"][0];
API Endpoints
Text-to-Video
Model ID: alibaba/wan-2.6/text-to-video
Generate videos from text prompts.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | -- | Text description of the video to generate |
negative_prompt | string | No | -- | Text description of what to avoid in the video |
audio | string | No | -- | Audio URL to guide generation |
size | string | No | 1280*720 | Output size in pixels. One of: 1280*720, 720*1280, 1920*1080, 1080*1920 |
duration | integer | No | 5 | Video duration in seconds. One of: 5, 10, 15 |
shot_type | string | No | single | Shot type. One of: single, multi |
enable_prompt_expansion | boolean | No | false | Enable prompt optimizer for enhanced prompts |
seed | integer | No | -1 | Random seed (-1 for random). Range: -1 to 2147483647 |
Example
import wavespeed from 'wavespeed';
const output_url = (await wavespeed.run(
"alibaba/wan-2.6/text-to-video",
{
prompt: "A timelapse of a city skyline transitioning from day to night, cinematic",
negative_prompt: "blurry, low quality, distorted",
size: "1920*1080",
duration: 10,
shot_type: "single",
seed: 42
}
))["outputs"][0];
Image-to-Video
Model ID: alibaba/wan-2.6/image-to-video
Animate a source image into a video using a text prompt.
Parameters
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chengzeyi-wavespeed-wan-26": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
wavespeed-watermark-remover
Remove watermarks, logos, captions, and text overlays from images and videos using WaveSpeed AI. Intelligently detects and removes watermarks while preserving texture and background. Supports images and videos up to 10 minutes. Use when the user wants to remove watermarks or text overlays from media.
wavespeed-face-swapper
Swap faces in images and videos using WaveSpeed AI. Supports image face swap and video face swap with multi-face targeting. Produces watermark-free results with automatic lighting and skin tone adaptation. Use when the user wants to replace a face in an image or video with another face.
wavespeed-infinitetalk
Generate talking head videos from a portrait image and audio using WaveSpeed AI's InfiniteTalk model. Produces lip-synced video up to 10 minutes long at 480p or 720p. Supports optional mask images to target specific faces and text prompts for additional guidance. Use when the user wants to animate a face with audio or create talking avatar videos.
wavespeed-minimax-speech-26
Convert text to speech using MiniMax Speech 2.6 Turbo via WaveSpeed AI. Features ultra-human voice cloning, sub-250ms latency, 40+ languages, emotion control, and 200+ voice presets. Use when the user wants to generate speech audio from text.
wavespeed-nano-banana-2
Generate and edit images using Google's Nano Banana 2 model via WaveSpeed AI. Supports text-to-image generation and image editing with natural language prompts. Features native 4K resolution, flexible aspect ratios including ultra-narrow (1:8, 8:1), multilingual text rendering, and camera-style controls. Use when the user wants to create images from text or edit existing images.