Official Verified media Safety 5/5

sherpa-onnx-tts

Local text-to-speech via sherpa-onnx (offline, no cloud)

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/andy27725/sherpa-onnx-tts-andy27725

Download Source Code (.zip)

What This Skill Does

The sherpa-onnx-tts skill provides a high-performance, completely offline text-to-speech (TTS) engine for your OpenClaw agent. By leveraging the industry-standard sherpa-onnx framework, this skill enables your agent to convert text input into synthesized speech locally, without requiring any connection to external cloud services or API endpoints. This ensures maximum privacy and low-latency audio generation, making it an ideal choice for privacy-conscious users or environments with restricted internet access. The system supports various voice models, including those from the popular Piper project, allowing for flexible voice selection based on your preferences for tone and clarity.

Installation

Installation is a straightforward three-step process. First, download the official sherpa-onnx runtime for your specific operating system and extract it into ~/.openclaw/tools/sherpa-onnx-tts/runtime. Second, download your preferred voice model from the sherpa-onnx repository and place it in ~/.openclaw/tools/sherpa-onnx-tts/models. Finally, configure your OpenClaw environment by updating your ~/.openclaw/openclaw.json file. Ensure the environment variables point correctly to your runtime and model directories as specified in the configuration documentation. Once configured, you can add the tool's bin directory to your system PATH to allow the agent to execute speech synthesis commands seamlessly.

Use Cases

This skill is perfect for creating local voice assistants, automating reading tasks for document accessibility, or generating narration for local multimedia projects. Because it is entirely offline, it is particularly useful in secure environments where data exfiltration is a concern, or for developers building systems that require robust speech capabilities without ongoing cloud infrastructure costs. It serves as an excellent foundational component for agents that need to provide audio feedback directly to the user.

Example Prompts

"Speak the following text aloud using the local TTS engine: 'The system update is now complete.'"
"Convert this article into an audio file named lecture.wav using the high-quality VITS model."
"Summarize the previous log entries and use the sherpa-onnx-tts tool to read the summary to me."

Tips & Limitations

The primary limitation of this skill is that it relies on the model you download; performance, such as speed and naturalness, will vary depending on the specific model selected. While the standard VITS models are high quality, larger models may require more system memory. Always ensure your environment variables are configured correctly; if you use a model with multiple .onnx files, you must explicitly set SHERPA_ONNX_MODEL_FILE to avoid runtime errors. Finally, remember that as a local tool, it does not support remote cloud-only features, meaning you are responsible for managing and updating your local model library manually to get the best results.

Read Full Documentation on GitHub

Metadata

Author@andy27725

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-andy27725-sherpa-onnx-tts-andy27725": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#speech#offline#audio#privacy

Safety Score: 5/5

Flags: file-read, file-write

Related Skills

memos-memory-guide

Use the MemOS Lite memory system to search and use the user's past conversations. Use this skill whenever the user refers to past chats, their own preferences or history, or when you need to answer from prior context. When auto-recall returns nothing (long or unclear user query), generate your own short search query and call memory_search. Use task_summary when you need full task context, skill_get for experience guides, and memory_timeline to expand around a memory hit.

andy27725 4473

freeride

Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.

andy27725 4473

sag

ElevenLabs text-to-speech with mac-style say UX.

andy27725 4473

openai-image-gen

Batch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.

andy27725 4473

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

andy27725 4473