What This Skill Does

The voice-reply skill enables OpenClaw to communicate with users through generated speech. It leverages the robust sherpa-onnx engine and Piper text-to-speech (TTS) models to convert text into audio. Unlike cloud-based TTS solutions that require subscriptions or internet connectivity, this skill is designed to run entirely locally. It supports high-quality English (ryan) and German (thorsten) voices, making it a perfect fit for multi-lingual automation tasks. The generated audio is output in a specific format that triggers Telegram's native voice message UI, ensuring that the assistant feels personal, conversational, and highly responsive.

Installation

To install, use the OpenClaw hub command: clawhub install openclaw/skills/skills/stolot0mt0m/voice-reply. For manual setup, ensure you have ffmpeg installed on your system. You must place the sherpa-onnx runtime and the required Piper voice models in the directory paths specified in the skill's environment variables (SHERPA_ONNX_DIR and PIPER_VOICES_DIR). After setting these, ensure the voice-reply binary has executable permissions in your system path.

Use Cases

This skill is ideal for scenarios where the user needs an auditory confirmation or response while away from a screen, such as smart home notifications or hands-free interaction. It works effectively for daily briefings, urgent alerts, or simple interactive voice response (IVR) systems. Because it is 100% offline, it is also perfect for privacy-focused environments, such as local server deployments where you do not want to send sensitive data to cloud APIs.

Example Prompts

"Read out the current status of my server updates using the voice reply feature."
"Can you give me a voice confirmation once the backup task finishes?"
"Summarize the last system log entries and speak it to me."

Tips & Limitations

The voice-reply skill currently supports English and German natively. While it includes an auto-detection feature, explicit language selection is recommended for mixed-language scenarios to ensure the correct model is utilized. Since it performs intensive audio synthesis, ensure your machine has adequate CPU resources if you plan to trigger it frequently. Note that it outputs audio files to the temporary directory; ensure your system has enough space in /tmp for temporary .ogg file processing.

voice-reply

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)