sapi-tts
Windows SAPI5 text-to-speech with Neural voices. Lightweight alternative to GPU-heavy TTS - zero GPU usage, instant generation. Auto-detects best available voice for your language. Works on Windows 10/11.
Why use this skill?
Instant, zero-GPU text-to-speech for Windows 10/11 using SAPI5. Perfect for low-resource local speech synthesis for AI agents and automation.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/korddie/sapi-ttsWhat This Skill Does
The sapi-tts skill is a high-performance, lightweight text-to-speech engine designed specifically for Windows 10 and 11 environments. Unlike modern neural TTS solutions that require heavy GPU acceleration or cloud-based API calls, this skill leverages the built-in Windows Speech API (SAPI5). It provides instant audio generation with zero GPU overhead, making it ideal for low-resource systems or users who prioritize speed and local privacy. The skill automatically detects the best available voice based on your language preferences, supporting both modern Neural voices available in Windows 11 and high-quality legacy voices.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/korddie/sapi-tts
Once installed, ensure the script is accessible within your skills directory. You can verify your setup by running the script with the -ListVoices flag, which will display all compatible SAPI5 voices currently registered on your Windows system, categorized by their type (Neural or Legacy) and culture.
Use Cases
This skill is perfect for scenarios where real-time feedback is required without the latency of cloud synthesis. Use it for reading back system notifications, creating audible alerts for your AI agent, or generating speech for desktop automation tasks. It is particularly useful for developers building offline-first applications that require accessibility features like screen readers or vocal status updates.
Example Prompts
- "Speak the following text aloud using the best available French neural voice: 'Bonjour, le processus est terminé.'"
- "List all my currently installed SAPI5 voices so I can choose a new one for my agent."
- "Convert this status report into speech using the default voice and set the playback rate to 1."
Tips & Limitations
Because this skill relies on the Windows native engine, the quality of the output depends heavily on the voice packs you have installed via Windows Settings. For the most natural-sounding results, ensure you have downloaded the 'Speech' language packs in your Windows OS settings. Note that this skill is strictly for Windows environments. Since it uses local system resources, it does not require an active internet connection, ensuring your data stays private and your generation remains instantaneous regardless of server latency.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-korddie-sapi-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution