qwen3-tts-mlx
Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.
Why use this skill?
Generate high-quality, multilingual speech locally on your Mac using the Qwen3-TTS MLX skill. Perfect for offline narration, cloning, and audio content creation.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/h1bomb/qwen3-tts-mlxWhat This Skill Does
The qwen3-tts-mlx skill enables local, high-performance speech synthesis on Apple Silicon hardware using the MLX framework. By leveraging Qwen3-TTS technology, this skill offers advanced capabilities including multilingual support (11 languages), voice cloning, and text-to-audio design. Unlike cloud-based TTS services, this skill runs entirely offline, ensuring privacy and zero latency costs while utilizing the unified memory architecture of M-series chips to generate high-quality, expressive speech for various media projects.
Installation
To integrate this skill, ensure you have an Apple Silicon Mac. First, run the installation command within your OpenClaw environment:
clawhub install openclaw/skills/skills/h1bomb/qwen3-tts-mlx
Additionally, ensure your system has the necessary dependencies by running pip install mlx-audio and confirming that ffmpeg is installed on your system path (brew install ffmpeg).
Use Cases
This skill is perfect for creators and developers who need robust audio output without external API dependencies. Common use cases include:
- Media Production: Generating video voiceovers, character lines for games, or podcast intros.
- Accessibility: Creating audiobooks or summarizing documents into spoken word for screen readers.
- Prototyping: Rapid iteration of voice interfaces or synthetic narration for local AI agents.
- Localization: Providing consistent multilingual support for content creators targeting global audiences.
Example Prompts
- "Generate an energetic English narration for my product video using the Ryan voice, keep it under 30 seconds."
- "Use the Uncle_Fu voice to read this news script in Chinese, and apply a calm, professional news anchor tone."
- "Create a voice clone from my file at 'reference.wav' and use it to read the following text: 'Welcome to the local AI future.'"
Tips & Limitations
- Memory Management: Ensure you have enough RAM allocated for the model size you choose; the CustomVoice model requires ~4GB, while the VoiceDesign model requires ~5GB. Close resource-heavy applications before generation if your system has 8GB of total memory.
- Style Control: The
--instructflag is powerful. Be descriptive with your emotional cues (e.g., 'excited', 'whispering', 'authoritative') to get the best results. - Hardware: This skill is strictly optimized for Apple Silicon (M1/M2/M3/M4). It will not function on Intel-based Macs or non-Apple hardware. Always select the variant (Base vs. CustomVoice vs. VoiceDesign) that best fits your VRAM availability.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-h1bomb-qwen3-tts-mlx": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
libvips-image
High-performance image processing with libvips. Use for resizing, converting, watermarking, thumbnails, and batch image operations with low memory usage.
gemini-watermark
Remove visible Gemini AI watermarks from images via reverse alpha blending. Use for cleaning Gemini-generated images, removing the star/sparkle logo watermark, batch watermark removal.