Official Verified

qwen3-tts

High-quality text-to-speech using Qwen3-TTS. 10 built-in speakers with emotional instruct control, voice cloning (3s of audio), natural-language voice design, 10+ languages, persistent named voices, and delivering audio via Telegram/WhatsApp as native voice messages. Auto-detects GPU hardware (CUDA, ROCm, Intel XPU, CPU).

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/damustermann/claw-qwen3-tts

Download Source Code (.zip)

Qwen3-TTS Skill

You have access to a powerful text-to-speech system that can generate human-quality speech with 10 built-in speakers, design new voices from descriptions, clone existing voices from audio samples, and send audio via Telegram/WhatsApp as native voice messages.

First-Time Setup

If the skill is not yet installed (no ~/clawd/skills/qwen3-tts directory), run:

bash <(curl -fsSL https://raw.githubusercontent.com/daMustermann/claw-qwen3-tts/main/install.sh)

Or if already cloned but not set up (no .venv/ directory):

bash ~/clawd/skills/qwen3-tts/install.sh

This auto-detects the GPU (CUDA, ROCm, Intel XPU, or CPU-only), creates a Python venv, and installs all dependencies. It takes 5–15 minutes on first run.

Starting & Stopping the Server

Before any TTS operation, ensure the server is running:

# Start (idempotent — won't restart if already running)
bash ~/clawd/skills/qwen3-tts/scripts/start_server.sh

# Check health
bash ~/clawd/skills/qwen3-tts/scripts/health_check.sh

# Stop (when done)
bash ~/clawd/skills/qwen3-tts/scripts/stop_server.sh

The server runs at http://localhost:8880.

Available Models

Model ID	Use Case	Notes
`custom-voice-1.7b`	High-quality TTS with built-in speakers — default	Best quality, ~5 GB VRAM
`custom-voice-0.6b`	Fast TTS with built-in speakers	Lightweight, ~2 GB VRAM
`voice-design`	Design new voices from natural language descriptions	Uses VoiceDesign model
`base-1.7b`	Basic TTS (auto-corrected to `custom-voice-1.7b`)	Use `custom-voice-*` instead
`base-0.6b`	Basic TTS (auto-corrected to `custom-voice-0.6b`)	Use `custom-voice-*` instead

Important: On the /v1/audio/speech endpoint, base-* and voice-design models are automatically corrected to the corresponding custom-voice-* model. Always prefer custom-voice-1.7b or custom-voice-0.6b for speech generation.

Built-in Speakers

The custom-voice-* models include 10 built-in voices:

Chelsie · Ethan · Aidan · Serena · Ryan · Vivian · Claire · Lucas · Eleanor · Benjamin

You can discover speakers dynamically: curl http://localhost:8880/v1/speakers

Capabilities

1. Generate Speech from Text

When to use: User asks to speak text, read something aloud, generate audio, do a voiceover, narrate, or say something.

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-voice-1.7b",
    "input": "TEXT_HERE",
    "voice": "default",
    "speaker": "Chelsie",
    "language": "en",
    "instruct": "",
    "response_format": "wav"
  }' \
  --output ~/clawd/skills/qwen3-tts/output/speech.wav

Parameters:

Read Full Documentation on GitHub

Metadata

Author@damustermann

Stars2102

Updated2026-03-06

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-damustermann-claw-qwen3-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Related Skills

narrator-ai-cli

Create AI-narrated film/drama commentary videos via CLI. Two workflow paths (Original & Adapted narration), 100+ movies, 146 BGM tracks, 63 dubbing voices in 11 languages, 90+ narration templates. Use when creating narration videos, film commentary, short drama dubbing, or video production.

4myhime 4473

Lead Radar

Every morning, scans Reddit, Hacker News, Indie Hackers, Stack Overflow, Quora, Hashnode, Dev.to, GitHub, and Lobsters for people actively asking for what you sell. Delivers the top 10 buying-intent leads to your Telegram with a pre-drafted reply. Powered by Gemini 2.5 Flash.

bencpnd 4473

narrator-ai-cli

AI电影解说视频自动生成技能（AI解说大师 CLI Skill）。当用户需要创建电影解说视频、短剧解说、影视二创、AI配音旁白视频、film commentary、video narration、drama dubbing、movie narration时触发。内置93部电影素材、146首BGM、63种配音音色（11种语言）、90+解说模板。通过narrator-ai-cli命令行工具实现：搜片选片→选择模板→选BGM→选配音→生成文案→合成视频的全流程自动化。CLI client for Narrator AI (AI解说大师) video narration API. Use when user needs to create AI narration videos, manage narration tasks, browse dubbing/BGM/material resources, or automate video production.

4myhime 4473

podcast-agent

Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.

besty0121 4473

agent3-hub

Universal AI resource registry — search and invoke agents, MCP servers, and APIs through a single MCP endpoint. Includes Telegram content search, Google search, X/Twitter search, and more.

agent3-666 4473

qwen3-tts

Install via CLI (Recommended)

Qwen3-TTS Skill

First-Time Setup

Starting & Stopping the Server

Available Models

Built-in Speakers

Capabilities

1. Generate Speech from Text

Metadata

Tags

Related Skills

narrator-ai-cli

Lead Radar

narrator-ai-cli

podcast-agent

agent3-hub