ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

qwen3-tts-mlx

Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.

Why use this skill?

Generate high-quality, multilingual speech locally on your Mac using the Qwen3-TTS MLX skill. Perfect for offline narration, cloning, and audio content creation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/h1bomb/qwen3-tts-mlx
Or

What This Skill Does

The qwen3-tts-mlx skill enables local, high-performance speech synthesis on Apple Silicon hardware using the MLX framework. By leveraging Qwen3-TTS technology, this skill offers advanced capabilities including multilingual support (11 languages), voice cloning, and text-to-audio design. Unlike cloud-based TTS services, this skill runs entirely offline, ensuring privacy and zero latency costs while utilizing the unified memory architecture of M-series chips to generate high-quality, expressive speech for various media projects.

Installation

To integrate this skill, ensure you have an Apple Silicon Mac. First, run the installation command within your OpenClaw environment: clawhub install openclaw/skills/skills/h1bomb/qwen3-tts-mlx Additionally, ensure your system has the necessary dependencies by running pip install mlx-audio and confirming that ffmpeg is installed on your system path (brew install ffmpeg).

Use Cases

This skill is perfect for creators and developers who need robust audio output without external API dependencies. Common use cases include:

  • Media Production: Generating video voiceovers, character lines for games, or podcast intros.
  • Accessibility: Creating audiobooks or summarizing documents into spoken word for screen readers.
  • Prototyping: Rapid iteration of voice interfaces or synthetic narration for local AI agents.
  • Localization: Providing consistent multilingual support for content creators targeting global audiences.

Example Prompts

  1. "Generate an energetic English narration for my product video using the Ryan voice, keep it under 30 seconds."
  2. "Use the Uncle_Fu voice to read this news script in Chinese, and apply a calm, professional news anchor tone."
  3. "Create a voice clone from my file at 'reference.wav' and use it to read the following text: 'Welcome to the local AI future.'"

Tips & Limitations

  • Memory Management: Ensure you have enough RAM allocated for the model size you choose; the CustomVoice model requires ~4GB, while the VoiceDesign model requires ~5GB. Close resource-heavy applications before generation if your system has 8GB of total memory.
  • Style Control: The --instruct flag is powerful. Be descriptive with your emotional cues (e.g., 'excited', 'whispering', 'authoritative') to get the best results.
  • Hardware: This skill is strictly optimized for Apple Silicon (M1/M2/M3/M4). It will not function on Intel-based Macs or non-Apple hardware. Always select the variant (Base vs. CustomVoice vs. VoiceDesign) that best fits your VRAM availability.

Metadata

Author@h1bomb
Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-h1bomb-qwen3-tts-mlx": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#speech-synthesis#mlx#apple-silicon#audio
Safety Score: 5/5

Flags: file-write, file-read, code-execution