ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 5/5

sapi-tts

Windows SAPI5 text-to-speech with Neural voices. Lightweight alternative to GPU-heavy TTS - zero GPU usage, instant generation. Auto-detects best available voice for your language. Works on Windows 10/11.

Why use this skill?

Instant, zero-GPU text-to-speech for Windows 10/11 using SAPI5. Perfect for low-resource local speech synthesis for AI agents and automation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/korddie/sapi-tts
Or

What This Skill Does

The sapi-tts skill is a high-performance, lightweight text-to-speech engine designed specifically for Windows 10 and 11 environments. Unlike modern neural TTS solutions that require heavy GPU acceleration or cloud-based API calls, this skill leverages the built-in Windows Speech API (SAPI5). It provides instant audio generation with zero GPU overhead, making it ideal for low-resource systems or users who prioritize speed and local privacy. The skill automatically detects the best available voice based on your language preferences, supporting both modern Neural voices available in Windows 11 and high-quality legacy voices.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/korddie/sapi-tts

Once installed, ensure the script is accessible within your skills directory. You can verify your setup by running the script with the -ListVoices flag, which will display all compatible SAPI5 voices currently registered on your Windows system, categorized by their type (Neural or Legacy) and culture.

Use Cases

This skill is perfect for scenarios where real-time feedback is required without the latency of cloud synthesis. Use it for reading back system notifications, creating audible alerts for your AI agent, or generating speech for desktop automation tasks. It is particularly useful for developers building offline-first applications that require accessibility features like screen readers or vocal status updates.

Example Prompts

  1. "Speak the following text aloud using the best available French neural voice: 'Bonjour, le processus est terminé.'"
  2. "List all my currently installed SAPI5 voices so I can choose a new one for my agent."
  3. "Convert this status report into speech using the default voice and set the playback rate to 1."

Tips & Limitations

Because this skill relies on the Windows native engine, the quality of the output depends heavily on the voice packs you have installed via Windows Settings. For the most natural-sounding results, ensure you have downloaded the 'Speech' language packs in your Windows OS settings. Note that this skill is strictly for Windows environments. Since it uses local system resources, it does not require an active internet connection, ensuring your data stays private and your generation remains instantaneous regardless of server latency.

Metadata

Author@korddie
Stars1656
Views0
Updated2026-02-28
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-korddie-sapi-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#windows#accessibility#speech#offline
Safety Score: 5/5

Flags: file-read, code-execution