ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified communication Safety 4/5

voice-reply

Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.

Why use this skill?

Convert OpenClaw text responses into high-quality, local voice messages using Piper and sherpa-onnx. Completely offline, private, and compatible with Telegram voice bubbles.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/stolot0mt0m/voice-reply
Or

What This Skill Does

The voice-reply skill enables OpenClaw to communicate with users through generated speech. It leverages the robust sherpa-onnx engine and Piper text-to-speech (TTS) models to convert text into audio. Unlike cloud-based TTS solutions that require subscriptions or internet connectivity, this skill is designed to run entirely locally. It supports high-quality English (ryan) and German (thorsten) voices, making it a perfect fit for multi-lingual automation tasks. The generated audio is output in a specific format that triggers Telegram's native voice message UI, ensuring that the assistant feels personal, conversational, and highly responsive.

Installation

To install, use the OpenClaw hub command: clawhub install openclaw/skills/skills/stolot0mt0m/voice-reply. For manual setup, ensure you have ffmpeg installed on your system. You must place the sherpa-onnx runtime and the required Piper voice models in the directory paths specified in the skill's environment variables (SHERPA_ONNX_DIR and PIPER_VOICES_DIR). After setting these, ensure the voice-reply binary has executable permissions in your system path.

Use Cases

This skill is ideal for scenarios where the user needs an auditory confirmation or response while away from a screen, such as smart home notifications or hands-free interaction. It works effectively for daily briefings, urgent alerts, or simple interactive voice response (IVR) systems. Because it is 100% offline, it is also perfect for privacy-focused environments, such as local server deployments where you do not want to send sensitive data to cloud APIs.

Example Prompts

  1. "Read out the current status of my server updates using the voice reply feature."
  2. "Can you give me a voice confirmation once the backup task finishes?"
  3. "Summarize the last system log entries and speak it to me."

Tips & Limitations

The voice-reply skill currently supports English and German natively. While it includes an auto-detection feature, explicit language selection is recommended for mixed-language scenarios to ensure the correct model is utilized. Since it performs intensive audio synthesis, ensure your machine has adequate CPU resources if you plan to trigger it frequently. Note that it outputs audio files to the temporary directory; ensure your system has enough space in /tmp for temporary .ogg file processing.

Metadata

Stars982
Views0
Updated2026-02-14
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-stolot0mt0m-voice-reply": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#offline#voice#audio#accessibility
Safety Score: 4/5

Flags: file-write, file-read, code-execution