ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

edge-tts-uvx

Text-to-speech conversion using `uvx edge-tts` for generating audio from text. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/al-one/edge-tts-uvx
Or

What This Skill Does

The edge-tts-uvx skill enables OpenClaw AI to synthesize natural-sounding speech from text using Microsoft's high-quality neural TTS engines. By leveraging the uvx edge-tts utility, this skill allows the agent to generate audio files (MP3 format) from any text string. It supports a vast array of global voices, language settings, and allows for fine-grained control over prosody, including playback rate, volume, and pitch adjustments. Additionally, it can output synchronized subtitles for generated audio files.

Installation

To integrate this text-to-speech capability into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/al-one/edge-tts-uvx

Ensure that you have uv installed on your system as the skill utilizes uvx for ephemeral execution of the TTS engine.

Use Cases

This skill is designed for scenarios where auditory feedback is more effective than visual text reading. Typical use cases include:

  • Accessibility: Creating audio versions of long-form reports or articles for users who prefer listening.
  • Content Creation: Generating voiceovers for video projects or presentations with professional-grade neural voices.
  • Multitasking Environments: Providing spoken updates while the user is occupied with tasks like driving, cooking, or exercising, allowing them to remain informed without needing to look at a screen.
  • Language Learning: Practicing pronunciation or listening comprehension using diverse international accents.

Example Prompts

  1. "TTS: Please convert the following paragraph into an audio file using the en-US-AriaNeural voice: [Insert Text Here]."
  2. "Speak this article to me: 'The future of AI is collaborative.' Use a 20% faster speaking rate for a quick briefing."
  3. "Generate a summary of my project status and save it as an mp3 file so I can listen to it while I walk to my meeting."

Tips & Limitations

  • Voice Selection: Always refer to the provided voice list to ensure the selected voice aligns with your target language and desired tone (e.g., cheerful vs. authoritative).
  • Resource Management: Since this skill performs file writes, ensure your temporary directory has sufficient write permissions and disk space to handle the generated MP3 files.
  • Formatting: The tool outputs standard MP3 files, which are widely compatible with most media players and browser interfaces.
  • Limitations: The service relies on Microsoft's online Edge TTS endpoints; therefore, an active internet connection is required to fetch the neural voice models.

Metadata

Author@al-one
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-al-one-edge-tts-uvx": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice#audio#speech#accessibility
Safety Score: 4/5

Flags: file-write, external-api