sherpa-onnx-tts
Local text-to-speech via sherpa-onnx (offline, no cloud)
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/andy27725/sherpa-onnx-tts-andy27725What This Skill Does
The sherpa-onnx-tts skill provides a high-performance, completely offline text-to-speech (TTS) engine for your OpenClaw agent. By leveraging the industry-standard sherpa-onnx framework, this skill enables your agent to convert text input into synthesized speech locally, without requiring any connection to external cloud services or API endpoints. This ensures maximum privacy and low-latency audio generation, making it an ideal choice for privacy-conscious users or environments with restricted internet access. The system supports various voice models, including those from the popular Piper project, allowing for flexible voice selection based on your preferences for tone and clarity.
Installation
Installation is a straightforward three-step process. First, download the official sherpa-onnx runtime for your specific operating system and extract it into ~/.openclaw/tools/sherpa-onnx-tts/runtime. Second, download your preferred voice model from the sherpa-onnx repository and place it in ~/.openclaw/tools/sherpa-onnx-tts/models. Finally, configure your OpenClaw environment by updating your ~/.openclaw/openclaw.json file. Ensure the environment variables point correctly to your runtime and model directories as specified in the configuration documentation. Once configured, you can add the tool's bin directory to your system PATH to allow the agent to execute speech synthesis commands seamlessly.
Use Cases
This skill is perfect for creating local voice assistants, automating reading tasks for document accessibility, or generating narration for local multimedia projects. Because it is entirely offline, it is particularly useful in secure environments where data exfiltration is a concern, or for developers building systems that require robust speech capabilities without ongoing cloud infrastructure costs. It serves as an excellent foundational component for agents that need to provide audio feedback directly to the user.
Example Prompts
- "Speak the following text aloud using the local TTS engine: 'The system update is now complete.'"
- "Convert this article into an audio file named lecture.wav using the high-quality VITS model."
- "Summarize the previous log entries and use the sherpa-onnx-tts tool to read the summary to me."
Tips & Limitations
The primary limitation of this skill is that it relies on the model you download; performance, such as speed and naturalness, will vary depending on the specific model selected. While the standard VITS models are high quality, larger models may require more system memory. Always ensure your environment variables are configured correctly; if you use a model with multiple .onnx files, you must explicitly set SHERPA_ONNX_MODEL_FILE to avoid runtime errors. Finally, remember that as a local tool, it does not support remote cloud-only features, meaning you are responsible for managing and updating your local model library manually to get the best results.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-andy27725-sherpa-onnx-tts-andy27725": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write
Related Skills
memos-memory-guide
Use the MemOS Lite memory system to search and use the user's past conversations. Use this skill whenever the user refers to past chats, their own preferences or history, or when you need to answer from prior context. When auto-recall returns nothing (long or unclear user query), generate your own short search query and call memory_search. Use task_summary when you need full task context, skill_get for experience guides, and memory_timeline to expand around a memory hit.
freeride
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.
sag
ElevenLabs text-to-speech with mac-style say UX.
openai-image-gen
Batch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.
openai-whisper
Local speech-to-text with the Whisper CLI (no API key).