minimax-tools
Direct MiniMax API integration for speech synthesis (TTS), voice cloning, image generation, video generation, and music generation using local Python scripts instead of MCP. Use when you want reliable script-based MiniMax workflows inside OpenClaw for: (1) text-to-speech with built-in Chinese/English defaults or explicit voice IDs, (2) voice cloning with upload + preview flows, (3) text-to-image or reference-image generation, (4) text-to-video, image-to-video, or first/last-frame video generation with async polling/download, and (5) music generation from prompts and lyrics.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/cytwyatt/minimax-tools-skillMiniMax Tools
Use this skill to call MiniMax multimodal APIs directly through local Python wrappers instead of relying on an external MCP server.
Overview
This skill currently supports:
- Speech synthesis (TTS)
- Voice cloning
- Image generation
- Video generation
- Music generation
All wrappers are exposed through a single entrypoint script:
python3 scripts/minimax.py <subcommand> ...
Read references/api-notes.md only when you need endpoint details or parameter reminders.
Prerequisites
Expect these environment variables to be available before running the scripts:
MINIMAX_API_KEY
Optional:
MINIMAX_BASE_URLif you need to override the default API host
Python dependency:
requests
Routing guide
- Use
ttsfor speech synthesis - Use
voicefor uploading clone inputs, creating cloned voices, and optionally downloading preview audio - Use
imagefor text-to-image or reference-image generation - Use
videofor text-to-video, image-to-video, or first/last-frame video workflows - Use
musicfor song or instrumental generation
TTS defaults
- Default model:
speech-2.8-turbo - Default format:
mp3 - Default sample rate:
32000 - Default bitrate:
128000 - Default Chinese voice:
Chinese (Mandarin)_Lyrical_Voice - Default English voice:
English_Graceful_Lady - If
--voiceis omitted, the script uses--voice-lang zh|enand defaults tozh
Voice cloning notes
- Clone source audio constraints:
mp3,m4a, orwav- 10 seconds to 5 minutes
- <= 20 MB
- Optional prompt audio constraints:
mp3,m4a, orwav- under 8 seconds
- <= 20 MB
- If cloning succeeds, the returned
voice_idcan be used immediately in TTS - MiniMax documentation notes cloned voices are temporary unless used in real TTS within 7 days
Video support
Supported modes:
- text-to-video:
video create - image-to-video:
video i2v - first/last-frame video:
video fl2v
Video creation is asynchronous. Use video query, video wait, and video download for task follow-up.
File handling rules
- Prefer saving outputs locally and returning file paths
- Local image inputs for image/video wrappers can be converted to Data URLs automatically
- Prefer URL-based output when MiniMax returns temporary files, then download immediately
- Avoid tight polling loops for async video jobs
Resources
scripts/minimax.py- unified CLI entrypointscripts/minimax_tts.py- TTS wrapperscripts/minimax_voice.py- voice cloning wrapperscripts/minimax_image.py- image generation wrapperscripts/minimax_video.py- video generation wrapperscripts/minimax_music.py- music generation wrapperreferences/api-notes.md- focused API notes and constraints
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-cytwyatt-minimax-tools-skill": {
"enabled": true,
"auto_update": true
}
}
}