What This Skill Does

speakturbo-tts is a high-performance, low-latency text-to-speech engine integrated into the OpenClaw ecosystem. Designed for real-time interaction, this skill provides a seamless voice experience for your AI agent by achieving a remarkable ~90ms latency once the daemon is warmed up. It functions via a lightweight Rust CLI wrapper that communicates with a persistent Python-based daemon, leveraging the pocket-tts architecture to ensure rapid audio synthesis. With 8 distinct, high-quality built-in voices, users can customize the persona of their agent instantly. It serves as an ideal solution for developers building voice-responsive interfaces, interactive dashboards, or real-time notification systems where waiting for cloud-based synthesis would disrupt the user flow.

Installation

To integrate this skill into your environment, run the following command in your terminal: clawhub install openclaw/skills/skills/emzod/speakturbo-tts Ensure you have the necessary system audio dependencies installed to handle the 24kHz mono stream output. The skill will automatically handle the spawning of the daemon on its first execution.

Use Cases

Real-Time AI Assistants: Provide instant auditory feedback for your Claude or local LLM instances, making the AI feel more present and reactive.
Accessibility Tools: Use the text-to-speech capability to read logs, system warnings, or chat responses aloud for visually impaired users or for hands-free workflows.
Event Notifications: Trigger vocal alerts for system events, build completions, or time-sensitive task reminders.
Rapid Prototyping: Quickly add synthetic voice output to automation scripts without needing external API keys or complex cloud configurations.

Example Prompts

"Speak the current status of my system build using the marius voice."
"Read the last five lines of the output log aloud so I can listen while I work."
"Summarize the latest project updates and speak them using the alba voice."

Tips & Limitations

The first execution of the skill will take 2-5 seconds to initialize the daemon and load the model into memory; plan for this if you are using it in a startup sequence. To maximize performance, keep the daemon warm. Use the -q flag for a cleaner terminal output if integrating into a larger automated pipeline. Be mindful of file system security: the tool enforces an allowlist for writing .wav files. If you encounter errors when saving files, update your ~/.speakturbo/config file to include your intended directory paths.

speakturbo-tts

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)