ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

qwen-tts

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

Why use this skill?

Generate high-quality, offline text-to-speech with OpenClaw. Features 9 premium voices, 10 languages, and emotional voice control using the powerful Qwen3-TTS-12Hz-1.7B model.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/paki81/qwen-tts
Or

What This Skill Does

The qwen-tts skill is a high-performance, local text-to-speech engine powered by the Qwen3-TTS-12Hz-1.7B-CustomVoice model. It allows OpenClaw users to generate high-quality, expressive synthetic audio directly on their local hardware, bypassing the need for expensive and privacy-invasive cloud APIs like ElevenLabs. With support for 10 languages—including Italian, English, and Japanese—and a suite of 9 distinct premium speaker voices, this skill provides unparalleled control over vocal output, including emotional nuances and stylistic delivery via instruction-based prompts.

Installation

To integrate this skill into your OpenClaw environment, execute the following command: clawhub install openclaw/skills/skills/paki81/qwen-tts. Once installed, navigate to the skill directory at skills/public/qwen-tts and run the bash scripts/setup.sh command. This will initialize a dedicated virtual environment and download the necessary dependencies. Note that the first time you execute a speech synthesis task, the system will automatically download the 1.7GB model weight file from Hugging Face. Ensure you have sufficient disk space and a stable internet connection for this one-time initial setup.

Use Cases

This skill is perfect for creators needing local, offline voiceovers for multimedia projects. It is ideal for developers building voice-enabled applications, creating dynamic accessibility features for desktop tools, or generating interactive narrations within OpenClaw workflows. Because the model runs locally, it is suitable for sensitive data where privacy is paramount, as no audio data is transmitted to external servers.

Example Prompts

  • "OpenClaw, use qwen-tts to generate an Italian audio file saying 'Benvenuto nel futuro del text-to-speech' using the Vivian voice and save it as welcome.wav."
  • "Create a voice message using the Ryan speaker in English that says 'Hello, nice to meet you' with an enthusiastic and energetic tone."
  • "Please list all available speakers for the qwen-tts module so I can choose the best voice for my narrations."

Tips & Limitations

For optimal results, prioritize using a speaker's native language, though the model is cross-lingually capable. Use the -i flag to manipulate output style, such as 'Parla con entusiasmo' for Italian or 'Read like a narrator' for English. Since the model relies on a local 1.7B parameter footprint, ensure your system has sufficient RAM to process requests smoothly. As an offline model, it is limited to the predefined voice library and does not perform voice cloning of arbitrary audio samples.

Metadata

Author@paki81
Stars1249
Views0
Updated2026-02-21
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-paki81-qwen-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#offline#audio#qwen#voice
Safety Score: 4/5

Flags: file-write, file-read, code-execution