voice-reply
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.
Why use this skill?
Convert OpenClaw text responses into high-quality, local voice messages using Piper and sherpa-onnx. Completely offline, private, and compatible with Telegram voice bubbles.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/stolot0mt0m/voice-replyWhat This Skill Does
The voice-reply skill enables OpenClaw to communicate with users through generated speech. It leverages the robust sherpa-onnx engine and Piper text-to-speech (TTS) models to convert text into audio. Unlike cloud-based TTS solutions that require subscriptions or internet connectivity, this skill is designed to run entirely locally. It supports high-quality English (ryan) and German (thorsten) voices, making it a perfect fit for multi-lingual automation tasks. The generated audio is output in a specific format that triggers Telegram's native voice message UI, ensuring that the assistant feels personal, conversational, and highly responsive.
Installation
To install, use the OpenClaw hub command: clawhub install openclaw/skills/skills/stolot0mt0m/voice-reply. For manual setup, ensure you have ffmpeg installed on your system. You must place the sherpa-onnx runtime and the required Piper voice models in the directory paths specified in the skill's environment variables (SHERPA_ONNX_DIR and PIPER_VOICES_DIR). After setting these, ensure the voice-reply binary has executable permissions in your system path.
Use Cases
This skill is ideal for scenarios where the user needs an auditory confirmation or response while away from a screen, such as smart home notifications or hands-free interaction. It works effectively for daily briefings, urgent alerts, or simple interactive voice response (IVR) systems. Because it is 100% offline, it is also perfect for privacy-focused environments, such as local server deployments where you do not want to send sensitive data to cloud APIs.
Example Prompts
- "Read out the current status of my server updates using the voice reply feature."
- "Can you give me a voice confirmation once the backup task finishes?"
- "Summarize the last system log entries and speak it to me."
Tips & Limitations
The voice-reply skill currently supports English and German natively. While it includes an auto-detection feature, explicit language selection is recommended for mixed-language scenarios to ensure the correct model is utilized. Since it performs intensive audio synthesis, ensure your machine has adequate CPU resources if you plan to trigger it frequently. Note that it outputs audio files to the temporary directory; ensure your system has enough space in /tmp for temporary .ogg file processing.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-stolot0mt0m-voice-reply": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution