ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 4/5

speakturbo-tts

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.

Why use this skill?

Give your OpenClaw agent instant voice capabilities with speakturbo-tts. Experience ~90ms latency, 8 built-in voices, and efficient audio output for real-time interaction.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/emzod/speakturbo-tts
Or

What This Skill Does

speakturbo-tts is a high-performance, low-latency text-to-speech engine integrated into the OpenClaw ecosystem. Designed for real-time interaction, this skill provides a seamless voice experience for your AI agent by achieving a remarkable ~90ms latency once the daemon is warmed up. It functions via a lightweight Rust CLI wrapper that communicates with a persistent Python-based daemon, leveraging the pocket-tts architecture to ensure rapid audio synthesis. With 8 distinct, high-quality built-in voices, users can customize the persona of their agent instantly. It serves as an ideal solution for developers building voice-responsive interfaces, interactive dashboards, or real-time notification systems where waiting for cloud-based synthesis would disrupt the user flow.

Installation

To integrate this skill into your environment, run the following command in your terminal: clawhub install openclaw/skills/skills/emzod/speakturbo-tts Ensure you have the necessary system audio dependencies installed to handle the 24kHz mono stream output. The skill will automatically handle the spawning of the daemon on its first execution.

Use Cases

  • Real-Time AI Assistants: Provide instant auditory feedback for your Claude or local LLM instances, making the AI feel more present and reactive.
  • Accessibility Tools: Use the text-to-speech capability to read logs, system warnings, or chat responses aloud for visually impaired users or for hands-free workflows.
  • Event Notifications: Trigger vocal alerts for system events, build completions, or time-sensitive task reminders.
  • Rapid Prototyping: Quickly add synthetic voice output to automation scripts without needing external API keys or complex cloud configurations.

Example Prompts

  1. "Speak the current status of my system build using the marius voice."
  2. "Read the last five lines of the output log aloud so I can listen while I work."
  3. "Summarize the latest project updates and speak them using the alba voice."

Tips & Limitations

The first execution of the skill will take 2-5 seconds to initialize the daemon and load the model into memory; plan for this if you are using it in a startup sequence. To maximize performance, keep the daemon warm. Use the -q flag for a cleaner terminal output if integrating into a larger automated pipeline. Be mindful of file system security: the tool enforces an allowlist for writing .wav files. If you encounter errors when saving files, update your ~/.speakturbo/config file to include your intended directory paths.

Metadata

Author@emzod
Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-emzod-speakturbo-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice#audio#latency#cli
Safety Score: 4/5

Flags: file-write, file-read