ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

piper-tts

Local text-to-speech using Piper for voice message delivery. Use when the user asks for voice responses, audio messages, TTS, text-to-speech, voice notes, or wants to hear something spoken aloud. Converts text to speech locally (no cloud APIs, no cost, no latency) and delivers as voice messages on Telegram, Discord, or any channel supporting audio.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bewareofddog/beware-piper-tts
Or

What This Skill Does

The piper-tts skill integrates the Piper text-to-speech engine directly into your OpenClaw agent. By leveraging local neural processing, this skill enables your agent to generate high-quality, human-like audio responses without relying on external cloud APIs. This ensures zero latency, zero cost, and complete privacy, as no audio data is transmitted to third-party servers. When invoked, the skill processes text input, converts it to an audio stream, and outputs an MP3 file that the agent then delivers as a native voice message on platforms like Telegram or Discord. It is an ideal solution for users who prefer listening to responses rather than reading them.

Installation

To begin, ensure you have Python 3.9+ installed on your system. Navigate to your OpenClaw root directory and execute the setup script: scripts/setup-piper.sh. This command automates the installation of the necessary Python dependencies and downloads the default en_US-kusal-medium voice model. If you wish to expand your voice library, you can install additional models by providing the voice name as an argument to the setup script, such as scripts/setup-piper.sh --voice en_US-ryan-high.

Use Cases

This skill is perfect for scenarios where accessibility and convenience are paramount. It is highly effective for delivering long-form answers while the user is commuting, summarizing complex technical data into an audio brief, or providing a more personal, interactive feel during conversational tasks. It is best used on an ad-hoc basis when a user explicitly requests audio, rather than as a forced, global response setting.

Example Prompts

  1. "Can you explain how this code works, but send it as a voice note so I can listen while walking?"
  2. "Tell me a funny joke to brighten my mood, and please use the voice message format."
  3. "Summarize the latest news headlines for me. I'd prefer to hear it in a British accent."

Tips & Limitations

Piper is exceptionally fast, typically generating audio within one second. To maintain optimal system performance, do not set messages.tts.auto: "always" in your configuration, as this will force every response to incur processing time. Instead, keep TTS usage intentional. Be aware that while Piper is lightweight, it requires local disk space for voice models. If you encounter errors, ensure your system PATH is correctly configured to locate the Piper binaries. Since this runs locally, it is restricted by your local machine's hardware capabilities, though it is optimized for both Apple Silicon and Linux environments.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bewareofddog-beware-piper-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice#audio#local-ai#accessibility
Safety Score: 5/5

Flags: file-write, file-read, code-execution