qwenspeak
Text-to-speech generation via Qwen3-TTS over SSH. Preset voices, voice cloning, voice design. Use when the user wants to generate speech audio, clone voices, or work with TTS.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/psyb0t/qwenspeakqwenspeak
YAML-driven text-to-speech over SSH using Qwen3-TTS models.
For installation and deployment, see references/setup.md.
SSH Wrapper
Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.
scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file
TTS Generation
Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.
# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml
# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}
# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"
# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"
# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav
YAML Structure
Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.
steps:
- mode: custom-voice
model_size: 1.7b
speaker: Ryan
language: English
generate:
- text: "Hello world"
output: hello.wav
- text: "I cannot believe this!"
speaker: Vivian
instruct: "Speak angrily"
output: angry.wav
- mode: voice-design
generate:
- text: "Welcome to our store."
instruct: "A warm, friendly young female voice with a cheerful tone"
output: welcome.wav
- mode: voice-clone
model_size: 1.7b
ref_audio: ref.wav
ref_text: "Transcript of reference"
generate:
- text: "First line in cloned voice"
output: clone1.wav
- text: "Second line"
output: clone2.wav
Modes
custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.
voice-design — Describe the voice in natural language via instruct. 1.7B only.
voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.
Emotion trick for cloned voices
Upload references with different emotions, use separate steps:
scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
- mode: voice-clone
ref_audio: refs/happy.wav
ref_text: "transcript of happy ref"
generate:
- text: "Great news everyone!"
output: happy1.wav
- mode: voice-clone
ref_audio: refs/angry.wav
ref_text: "transcript of angry ref"
generate:
- text: "This is unacceptable"
output: angry1.wav
Job Management
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-psyb0t-qwenspeak": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
stealthy-auto-browse
Browser automation that passes CreepJS, BrowserScan, Pixelscan, and Cloudflare — zero CDP exposure, OS-level input, persistent fingerprints. Use when standard browser skills get 403s or CAPTCHAs.
mt5-httpapi
MetaTrader 5 trading via REST API — get market data, place/modify/close orders, manage positions, pull history. Use when you need to interact with forex/crypto/stock markets through MT5.
mediaproc
Process media files (video, audio, images) via a locked-down SSH container with ffmpeg, sox, and imagemagick. Use when the user wants to transcode video, process audio, manipulate images, or work with media files.