sora
Generate videos from text prompts or reference images using OpenAI Sora. ✅ USE WHEN: - Need AI-generated video from text description - Want image-to-video (animate a still image) - Creating cinematic/artistic video content - Need motion/animation without lip-sync ❌ DON'T USE WHEN: - Need lip-sync (person speaking) → use veed-ugc or ugc-manual - Just need image generation → use nano-banana-pro or morpheus - Editing existing videos → use Remotion - Need UGC-style talking head → use veed-ugc INPUT: Text prompt + optional reference image OUTPUT: MP4 video (various resolutions/durations)
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/pauldelavallaz/soraSora Video Generation
Generate videos using OpenAI's Sora API.
API Reference
Endpoint: POST https://api.openai.com/v1/videos
Parameters
| Parameter | Values | Description |
|---|---|---|
prompt | string | Text description of the video (required) |
input_reference | file | Optional image that guides generation |
model | sora-2, sora-2-pro | Model to use (default: sora-2) |
seconds | 4, 8, 12 | Video duration (default: 4) |
size | 720x1280, 1280x720, 1024x1792, 1792x1024 | Output resolution |
Important Notes
- Image dimensions must match video size exactly - the script auto-resizes
- Video generation takes 1-3 minutes typically
- Videos expire after ~1 hour - download immediately
Usage
# Basic text-to-video
uv run ~/.clawdbot/skills/sora/scripts/generate_video.py \
--prompt "A cat playing piano" \
--filename "output.mp4"
# Image-to-video (auto-resizes image)
uv run ~/.clawdbot/skills/sora/scripts/generate_video.py \
--prompt "Slow dolly shot, steam rising, warm lighting" \
--filename "output.mp4" \
--input-image "reference.png" \
--seconds 8 \
--size 720x1280
# With specific model
uv run ~/.clawdbot/skills/sora/scripts/generate_video.py \
--prompt "Cinematic scene" \
--filename "output.mp4" \
--model sora-2-pro \
--seconds 12
Script Parameters
| Flag | Description | Default |
|---|---|---|
--prompt, -p | Video description (required) | - |
--filename, -f | Output file path (required) | - |
--input-image, -i | Reference image path | None |
--seconds, -s | Duration: 4, 8, or 12 | 8 |
--size, -sz | Resolution | 720x1280 |
--model, -m | sora-2 or sora-2-pro | sora-2 |
--api-key, -k | OpenAI API key | env var |
--poll-interval | Check status every N seconds | 10 |
API Key
Set OPENAI_API_KEY environment variable or pass --api-key.
Prompt Engineering for Video
Good prompts include:
- Camera movement: dolly, pan, zoom, tracking shot
- Motion description: swirling, rising, falling, shifting
- Lighting: golden hour, candlelight, dramatic rim lighting
- Atmosphere: steam, particles, bokeh, haze
- Mood/style: cinematic, commercial, lifestyle, editorial
Example prompts:
Food commercial:
Slow dolly shot of gourmet dish, soft morning sunlight streaming through window,
subtle steam rising, warm cozy atmosphere, premium food commercial aesthetic
Lifestyle:
Golden hour light slowly shifting across mountains, gentle breeze rustling leaves,
serene morning atmosphere, premium lifestyle commercial
Product shot:
Cinematic close-up, dramatic lighting with warm highlights,
slow reveal, luxury commercial style
Workflow: Image → Video
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-pauldelavallaz-sora": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
veed-ugc
Generate UGC-style promotional videos with AI lip-sync. Takes an image (person with product from Morpheus/Ad-Ready) and a script (pure dialogue), creates a video of the person speaking. Uses ElevenLabs for voice synthesis.
ugc-manual
Generate lip-sync video from image + user's own audio recording. ✅ USE WHEN: - User provides their OWN audio file (voice recording) - Want to sync image to specific audio/voice - User recorded the script themselves - Need exact audio timing preserved ❌ DON'T USE WHEN: - User provides text script (not audio) → use veed-ugc - Need AI to generate the voice → use veed-ugc - Don't have audio file yet → use veed-ugc with script INPUT: Image + audio file (user's recording) OUTPUT: MP4 video with lip-sync to provided audio KEY DIFFERENCE: veed-ugc = script → AI voice → video ugc-manual = user audio → video (no voice generation)
sora
Generate videos using OpenAI's Sora API. Use when the user asks to generate, create, or make videos from text prompts or reference images. Supports image-to-video generation with automatic resizing.
ad-ready
Generate advertising images automatically from a product URL + brand profile. ✅ USE WHEN: - User provides a product URL (e-commerce link) - Want automated product scraping + image generation - Have a brand profile to apply (70+ brands available) - Need funnel-stage targeting (awareness/consideration/conversion) - Want AI to auto-select model, scene, lighting based on brand ❌ DON'T USE WHEN: - User provides local product image file → use morpheus-fashion-design - Don't need a person in the image → use nano-banana-pro - Want manual control over model, scene, packs → use morpheus-fashion-design - Already have hero image, need variations → use multishot-ugc - Need video output → use veed-ugc after image generation INPUT: Product URL + brand name (optional) + funnel stage (optional) OUTPUT: PNG advertising image with product + model
morpheus-fashion-design
Generate professional advertising images with AI models holding/wearing products. ✅ USE WHEN: - Need a person/model in the image WITH a product - Creating fashion ads, product campaigns, commercial photography - Want consistent model face across multiple shots - Need professional lighting/camera simulation - Input: product image + model reference (or catalog) ❌ DON'T USE WHEN: - Just editing/modifying an existing image → use nano-banana-pro - Product-only shot without a person → use nano-banana-pro - Already have the hero image, need variations → use multishot-ugc - Need video, not image → use veed-ugc after generating image - URL-based product fetch with brand profile → use ad-ready instead OUTPUT: Single high-quality PNG image (2K-4K resolution)