ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

sherpa-onnx-tts

Local text-to-speech via sherpa-onnx (offline, no cloud)

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/andy27725/sherpa-onnx-tts-andy27725
Or

What This Skill Does

The sherpa-onnx-tts skill provides a high-performance, completely offline text-to-speech (TTS) engine for your OpenClaw agent. By leveraging the industry-standard sherpa-onnx framework, this skill enables your agent to convert text input into synthesized speech locally, without requiring any connection to external cloud services or API endpoints. This ensures maximum privacy and low-latency audio generation, making it an ideal choice for privacy-conscious users or environments with restricted internet access. The system supports various voice models, including those from the popular Piper project, allowing for flexible voice selection based on your preferences for tone and clarity.

Installation

Installation is a straightforward three-step process. First, download the official sherpa-onnx runtime for your specific operating system and extract it into ~/.openclaw/tools/sherpa-onnx-tts/runtime. Second, download your preferred voice model from the sherpa-onnx repository and place it in ~/.openclaw/tools/sherpa-onnx-tts/models. Finally, configure your OpenClaw environment by updating your ~/.openclaw/openclaw.json file. Ensure the environment variables point correctly to your runtime and model directories as specified in the configuration documentation. Once configured, you can add the tool's bin directory to your system PATH to allow the agent to execute speech synthesis commands seamlessly.

Use Cases

This skill is perfect for creating local voice assistants, automating reading tasks for document accessibility, or generating narration for local multimedia projects. Because it is entirely offline, it is particularly useful in secure environments where data exfiltration is a concern, or for developers building systems that require robust speech capabilities without ongoing cloud infrastructure costs. It serves as an excellent foundational component for agents that need to provide audio feedback directly to the user.

Example Prompts

  1. "Speak the following text aloud using the local TTS engine: 'The system update is now complete.'"
  2. "Convert this article into an audio file named lecture.wav using the high-quality VITS model."
  3. "Summarize the previous log entries and use the sherpa-onnx-tts tool to read the summary to me."

Tips & Limitations

The primary limitation of this skill is that it relies on the model you download; performance, such as speed and naturalness, will vary depending on the specific model selected. While the standard VITS models are high quality, larger models may require more system memory. Always ensure your environment variables are configured correctly; if you use a model with multiple .onnx files, you must explicitly set SHERPA_ONNX_MODEL_FILE to avoid runtime errors. Finally, remember that as a local tool, it does not support remote cloud-only features, meaning you are responsible for managing and updating your local model library manually to get the best results.

Metadata

Author@andy27725
Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-andy27725-sherpa-onnx-tts-andy27725": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#speech#offline#audio#privacy
Safety Score: 5/5

Flags: file-read, file-write