ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

mm-easy-voice

Simple text-to-speech skill using MiniMax Voice API. Converts text to audio with customizable voice selection. Use for generating speech audio from text.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/blue-coconut/mm-easy-voice
Or

What This Skill Does

The mm-easy-voice skill provides a robust interface for the MiniMax Voice API, enabling OpenClaw agents to perform high-quality text-to-speech (TTS) conversion. It transforms plain text inputs into natural, expressive audio files. The skill supports a wide range of voices, allows for custom pause insertion for natural cadence, and includes advanced features like voice cloning from samples and voice design based on descriptive prompts. It is designed for seamless integration into automation workflows where audio output is required, such as creating voice-overs, automated accessibility features, or dynamic media generation.

Installation

To install this skill, use the following command in your terminal: clawhub install openclaw/skills/skills/blue-coconut/mm-easy-voice

Ensure you have configured your environment by checking python check_environment.py and setting the MINIMAX_VOICE_API_KEY environment variable. It is recommended to have FFmpeg installed for advanced audio manipulation.

Use Cases

  • Content Creation: Generate voice-overs for video projects or social media content directly from text scripts.
  • Accessibility: Convert written reports, articles, or documentation into high-quality audio files for screen readers or audio-only consumption.
  • Virtual Assistants: Power the voice interface of custom agents that need to respond to user queries with human-like, emotional speech.
  • Multimedia Automation: Automate the assembly of audiobooks or podcasts by concatenating multiple generated audio clips into a final file.

Example Prompts

  1. "Generate an audio file titled 'intro.mp3' using the 'female-shaonv' voice that says 'Welcome to the presentation, please hold for a moment' with a 2-second pause before the final word."
  2. "List all available voices so I can choose a professional tone for my corporate announcement."
  3. "Clone the voice from 'interview.mp3' and save it as 'custom-agent-voice' for future use."

Tips & Limitations

  • Text Limits: Each request is limited to 10,000 characters. Break longer scripts into smaller segments for better performance.
  • Custom Pauses: Improve natural flow by inserting tags like <#1.5#> to force specific pauses where appropriate.
  • Voice Selection: Always check reference/voice_catalog.md to ensure you are selecting the voice best suited for your target audience and emotional context.
  • Emotion: The speech-2.8 models automatically detect and apply emotional nuance, so ensure your written text clearly conveys the intended tone.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-blue-coconut-mm-easy-voice": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#voice-cloning#media#automation
Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution