speakturbo-tts
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
Why use this skill?
Give your OpenClaw agent instant voice capabilities with speakturbo-tts. Experience ~90ms latency, 8 built-in voices, and efficient audio output for real-time interaction.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/emzod/speakturbo-ttsWhat This Skill Does
speakturbo-tts is a high-performance, low-latency text-to-speech engine integrated into the OpenClaw ecosystem. Designed for real-time interaction, this skill provides a seamless voice experience for your AI agent by achieving a remarkable ~90ms latency once the daemon is warmed up. It functions via a lightweight Rust CLI wrapper that communicates with a persistent Python-based daemon, leveraging the pocket-tts architecture to ensure rapid audio synthesis. With 8 distinct, high-quality built-in voices, users can customize the persona of their agent instantly. It serves as an ideal solution for developers building voice-responsive interfaces, interactive dashboards, or real-time notification systems where waiting for cloud-based synthesis would disrupt the user flow.
Installation
To integrate this skill into your environment, run the following command in your terminal:
clawhub install openclaw/skills/skills/emzod/speakturbo-tts
Ensure you have the necessary system audio dependencies installed to handle the 24kHz mono stream output. The skill will automatically handle the spawning of the daemon on its first execution.
Use Cases
- Real-Time AI Assistants: Provide instant auditory feedback for your Claude or local LLM instances, making the AI feel more present and reactive.
- Accessibility Tools: Use the text-to-speech capability to read logs, system warnings, or chat responses aloud for visually impaired users or for hands-free workflows.
- Event Notifications: Trigger vocal alerts for system events, build completions, or time-sensitive task reminders.
- Rapid Prototyping: Quickly add synthetic voice output to automation scripts without needing external API keys or complex cloud configurations.
Example Prompts
- "Speak the current status of my system build using the marius voice."
- "Read the last five lines of the output log aloud so I can listen while I work."
- "Summarize the latest project updates and speak them using the alba voice."
Tips & Limitations
The first execution of the skill will take 2-5 seconds to initialize the daemon and load the model into memory; plan for this if you are using it in a startup sequence. To maximize performance, keep the daemon warm. Use the -q flag for a cleaner terminal output if integrating into a larger automated pipeline. Be mindful of file system security: the tool enforces an allowlist for writing .wav files. If you encounter errors when saving files, update your ~/.speakturbo/config file to include your intended directory paths.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-emzod-speakturbo-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read