ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 5/5

sapi-tts

Windows SAPI5 text-to-speech with Neural voices. Lightweight alternative to GPU-heavy TTS - zero GPU usage, instant generation. Auto-detects best available voice for your language. Works on Windows 10/11.

Why use this skill?

Add instant, zero-GPU text-to-speech to your OpenClaw agent using Windows SAPI5. Supports Neural voices and works on Windows 10 and 11.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/dexiaong/sapi-ttsl
Or

What This Skill Does

The sapi-tts skill is a highly optimized, lightweight text-to-speech solution designed for Windows 10 and 11 environments. By leveraging the built-in SAPI5 (Speech API version 5) interface, this skill allows OpenClaw agents to generate human-like speech output without consuming any GPU resources. It is engineered for instant, real-time generation, making it an ideal choice for low-latency voice notifications, automated reading tasks, or accessible UI components. The skill intelligently auto-detects the best available voice based on the user's system language, supporting both high-quality Neural voices (on Windows 11) and stable legacy voices (on Windows 10).

Installation

To install the sapi-tts skill, follow these steps:

  1. Open your terminal in the designated OpenClaw skills directory.
  2. Execute the command: clawhub install openclaw/skills/skills/dexiaong/sapi-tts.
  3. Navigate to the installed directory and ensure the provided tts.ps1 PowerShell script is present.
  4. Verify your environment has execution policy settings that allow local script execution by running Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser.
  5. You can test the installation by listing available voices with the command ./tts.ps1 -ListVoices.

Use Cases

  • Real-time Voice Feedback: Providing instant audible alerts or confirmations for long-running agent tasks.
  • Accessibility: Converting textual data, logs, or chat history into audio for visually impaired users.
  • Agent Interaction: Adding a natural human interface to your agent without the hardware overhead of running cloud-based or local GPU-intensive TTS models.
  • Scripted Automation: Integrating voice synthesis directly into local Windows workflows where network-dependent cloud TTS APIs might be too slow or unreliable.

Example Prompts

  1. "Speak the following text aloud using the best available neural voice: 'Task completed successfully.'"
  2. "List all my installed Windows voices to see if I have a high-quality French neural voice available."
  3. "Convert this status report into speech and play it back to me immediately."

Tips & Limitations

  • Neural vs Legacy: Neural voices are only available natively on Windows 11. On Windows 10, the output will default to legacy SAPI5 voices which may sound more synthetic. Ensure your Windows speech recognition settings are updated.
  • Performance: Because this uses local system calls, generation speed is nearly instantaneous regardless of text length.
  • Zero GPU Usage: Perfect for systems where your GPU is already fully utilized by LLMs or image generation models.
  • Security: The script runs locally via PowerShell; ensure it remains in a trusted directory.

Metadata

Author@dexiaong
Stars1100
Views0
Updated2026-02-17
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-dexiaong-sapi-ttsl": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#windows#accessibility#speech#automation
Safety Score: 5/5

Flags: file-read, code-execution