mm-voice-maker
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creating custom voices, or processing/merging audio.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/blue-coconut/mm-voice-makerMiniMax Voice Maker
Professional text-to-speech skill with emotion detection, voice cloning, and audio processing capabilities powered by MiniMax Voice API and FFmpeg.
Capabilities
| Area | Features |
|---|---|
| TTS | Sync (HTTP/WebSocket), async (long text), streaming |
| Segment-based | Multi-voice, multi-emotion synthesis from segments.json, auto merge |
| Voice | Cloning (10s–5min), design (text prompt), management |
| Audio | Format conversion, merge, normalize, trim, remove silence (FFmpeg) |
File structure:
mmVoice_Maker/
├── SKILL.md # This overview
├── mmvoice.py # CLI tool (recommended for Agents)
├── check_environment.py # Environment verification
├── requirements.txt
├── scripts/ # Entry: scripts/__init__.py
│ ├── utils.py # Config, data classes
│ ├── sync_tts.py # HTTP/WebSocket TTS
│ ├── async_tts.py # Long text TTS
│ ├── segment_tts.py # Segment-based TTS (multi-voice, multi-emotion)
│ ├── voice_clone.py # Voice cloning
│ ├── voice_design.py # Voice design
│ ├── voice_management.py # List/delete voices
│ └── audio_processing.py # FFmpeg audio tools
└── reference/ # Load as needed
├── cli-guide.md # CLI usage guide
├── getting-started.md # Setup and quick test
├── tts-guide.md # Sync/async TTS workflows
├── voice-guide.md # Clone/design/manage
├── audio-guide.md # Audio processing
├── script-examples.md # Runnable code snippets
├── troubleshooting.md # Common issues
├── api_documentation.md # Complete API reference
└── voice_catalog.md # Voice selection guide
Main Workflow Guideline (Text to Speech)
6-step workflow: [step1]. Verify environment
[step2-preparation]⚠️NOTE: Before processing the text, you must read voice-catalog.md for voice selection.
[step2]. Process text into script → <cwd>/audio/segments.json. Note: [Step2.4] is really important, you must check it twice before sending the script to the user.
[step2.5]. ⚠️ Generate preview for user confirmation (highly recommended for multi-voice content)
[step3]. Present plan to user for confirmation
[step4]. Validate segments.json
[step5]. Generate and merge audio → intermediate files in <cwd>/audio/tmp/, final output in <cwd>/audio/output.mp3
[step6]. ⚠️ CRITICAL: User confirms audio quality FIRST → THEN cleanup temp files (only after user is satisfied)
<cwd>is Claude's current working directory (not the skill directory). Audio files are saved relative to where Claude is running commands.
Step 1: Verify environment
python check_environment.py
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-blue-coconut-mm-voice-maker": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
mm-easy-voice
Simple text-to-speech skill using MiniMax Voice API. Converts text to audio with customizable voice selection. Use for generating speech audio from text.
mm-music-maker
Create music with MiniMax music models (e.g., music-2.5). Use when generating songs or instrumental tracks from lyrics and style prompts, or when integrating MiniMax Music Generation API into scripts.
mm-music-expert
Create music with MiniMax music models (music-2.5+, music-2.5). Use when generating songs, instrumental tracks, or chanting from lyrics and style prompts via MiniMax Music Generation API. Guides music novices through an interactive workflow to produce professional-quality music.