Official Verified media Safety 5/5

qwen3-tts-mlx

Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.

Why use this skill?

Generate high-quality, multilingual speech locally on your Mac using the Qwen3-TTS MLX skill. Perfect for offline narration, cloning, and audio content creation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/h1bomb/qwen3-tts-mlx

Download Source Code (.zip)

What This Skill Does

The qwen3-tts-mlx skill enables local, high-performance speech synthesis on Apple Silicon hardware using the MLX framework. By leveraging Qwen3-TTS technology, this skill offers advanced capabilities including multilingual support (11 languages), voice cloning, and text-to-audio design. Unlike cloud-based TTS services, this skill runs entirely offline, ensuring privacy and zero latency costs while utilizing the unified memory architecture of M-series chips to generate high-quality, expressive speech for various media projects.

Installation

To integrate this skill, ensure you have an Apple Silicon Mac. First, run the installation command within your OpenClaw environment: clawhub install openclaw/skills/skills/h1bomb/qwen3-tts-mlx Additionally, ensure your system has the necessary dependencies by running pip install mlx-audio and confirming that ffmpeg is installed on your system path (brew install ffmpeg).

Use Cases

This skill is perfect for creators and developers who need robust audio output without external API dependencies. Common use cases include:

Media Production: Generating video voiceovers, character lines for games, or podcast intros.
Accessibility: Creating audiobooks or summarizing documents into spoken word for screen readers.
Prototyping: Rapid iteration of voice interfaces or synthetic narration for local AI agents.
Localization: Providing consistent multilingual support for content creators targeting global audiences.

Example Prompts

"Generate an energetic English narration for my product video using the Ryan voice, keep it under 30 seconds."
"Use the Uncle_Fu voice to read this news script in Chinese, and apply a calm, professional news anchor tone."
"Create a voice clone from my file at 'reference.wav' and use it to read the following text: 'Welcome to the local AI future.'"

Tips & Limitations

Memory Management: Ensure you have enough RAM allocated for the model size you choose; the CustomVoice model requires ~4GB, while the VoiceDesign model requires ~5GB. Close resource-heavy applications before generation if your system has 8GB of total memory.
Style Control: The --instruct flag is powerful. Be descriptive with your emotional cues (e.g., 'excited', 'whispering', 'authoritative') to get the best results.
Hardware: This skill is strictly optimized for Apple Silicon (M1/M2/M3/M4). It will not function on Intel-based Macs or non-Apple hardware. Always select the variant (Base vs. CustomVoice vs. VoiceDesign) that best fits your VRAM availability.

Read Full Documentation on GitHub

Metadata

Author@h1bomb

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-h1bomb-qwen3-tts-mlx": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#speech-synthesis#mlx#apple-silicon#audio

Safety Score: 5/5

Flags: file-write, file-read, code-execution

Related Skills

libvips-image

High-performance image processing with libvips. Use for resizing, converting, watermarking, thumbnails, and batch image operations with low memory usage.

h1bomb 2387

gemini-watermark

Remove visible Gemini AI watermarks from images via reverse alpha blending. Use for cleaning Gemini-generated images, removing the star/sparkle logo watermark, batch watermark removal.

h1bomb 2387