ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

mm-voice-maker

Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creating custom voices, or processing/merging audio.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/blue-coconut/mm-voice-maker
Or

MiniMax Voice Maker

Professional text-to-speech skill with emotion detection, voice cloning, and audio processing capabilities powered by MiniMax Voice API and FFmpeg.

Capabilities

AreaFeatures
TTSSync (HTTP/WebSocket), async (long text), streaming
Segment-basedMulti-voice, multi-emotion synthesis from segments.json, auto merge
VoiceCloning (10s–5min), design (text prompt), management
AudioFormat conversion, merge, normalize, trim, remove silence (FFmpeg)

File structure:

mmVoice_Maker/
├── SKILL.md                       # This overview
├── mmvoice.py                     # CLI tool (recommended for Agents)
├── check_environment.py           # Environment verification
├── requirements.txt
├── scripts/                       # Entry: scripts/__init__.py
│   ├── utils.py                   # Config, data classes
│   ├── sync_tts.py                # HTTP/WebSocket TTS
│   ├── async_tts.py               # Long text TTS
│   ├── segment_tts.py             # Segment-based TTS (multi-voice, multi-emotion)
│   ├── voice_clone.py             # Voice cloning
│   ├── voice_design.py            # Voice design
│   ├── voice_management.py        # List/delete voices
│   └── audio_processing.py        # FFmpeg audio tools
└── reference/                     # Load as needed
    ├── cli-guide.md               # CLI usage guide
    ├── getting-started.md         # Setup and quick test
    ├── tts-guide.md               # Sync/async TTS workflows
    ├── voice-guide.md             # Clone/design/manage
    ├── audio-guide.md             # Audio processing
    ├── script-examples.md         # Runnable code snippets
    ├── troubleshooting.md         # Common issues
    ├── api_documentation.md       # Complete API reference
    └── voice_catalog.md           # Voice selection guide

Main Workflow Guideline (Text to Speech)

6-step workflow: [step1]. Verify environment

[step2-preparation]⚠️NOTE: Before processing the text, you must read voice-catalog.md for voice selection.

[step2]. Process text into script → <cwd>/audio/segments.json. Note: [Step2.4] is really important, you must check it twice before sending the script to the user.

[step2.5]. ⚠️ Generate preview for user confirmation (highly recommended for multi-voice content)

[step3]. Present plan to user for confirmation

[step4]. Validate segments.json

[step5]. Generate and merge audio → intermediate files in <cwd>/audio/tmp/, final output in <cwd>/audio/output.mp3

[step6]. ⚠️ CRITICAL: User confirms audio quality FIRST → THEN cleanup temp files (only after user is satisfied)

<cwd> is Claude's current working directory (not the skill directory). Audio files are saved relative to where Claude is running commands.

Step 1: Verify environment

python check_environment.py

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-blue-coconut-mm-voice-maker": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.