What This Skill Does

The sapi-tts skill is a high-performance, lightweight text-to-speech engine designed specifically for Windows 10 and 11 environments. Unlike modern neural TTS solutions that require heavy GPU acceleration or cloud-based API calls, this skill leverages the built-in Windows Speech API (SAPI5). It provides instant audio generation with zero GPU overhead, making it ideal for low-resource systems or users who prioritize speed and local privacy. The skill automatically detects the best available voice based on your language preferences, supporting both modern Neural voices available in Windows 11 and high-quality legacy voices.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/korddie/sapi-tts

Once installed, ensure the script is accessible within your skills directory. You can verify your setup by running the script with the -ListVoices flag, which will display all compatible SAPI5 voices currently registered on your Windows system, categorized by their type (Neural or Legacy) and culture.

Use Cases

This skill is perfect for scenarios where real-time feedback is required without the latency of cloud synthesis. Use it for reading back system notifications, creating audible alerts for your AI agent, or generating speech for desktop automation tasks. It is particularly useful for developers building offline-first applications that require accessibility features like screen readers or vocal status updates.

Example Prompts

"Speak the following text aloud using the best available French neural voice: 'Bonjour, le processus est terminé.'"
"List all my currently installed SAPI5 voices so I can choose a new one for my agent."
"Convert this status report into speech using the default voice and set the playback rate to 1."

Tips & Limitations

Because this skill relies on the Windows native engine, the quality of the output depends heavily on the voice packs you have installed via Windows Settings. For the most natural-sounding results, ensure you have downloaded the 'Speech' language packs in your Windows OS settings. Note that this skill is strictly for Windows environments. Since it uses local system resources, it does not require an active internet connection, ensuring your data stays private and your generation remains instantaneous regardless of server latency.

sapi-tts

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)