Official Verified media Safety 4/5

edge-tts

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

Why use this skill?

Integrate high-quality neural text-to-speech into OpenClaw with the edge-tts skill. Features custom voices, speed control, and subtitle support for accessible voice output.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/i3130002/edge-tts

Download Source Code (.zip)

What This Skill Does

The edge-tts skill provides a robust bridge between the OpenClaw agent and Microsoft's high-quality neural text-to-speech service. By leveraging the node-edge-tts package, this skill transforms plain text into natural-sounding human speech. It is designed to be highly configurable, allowing users to fine-tune audio characteristics such as voice selection, speech rate, and pitch. Beyond simple audio playback, it includes support for subtitle generation, making it an excellent choice for creators needing synchronized text data alongside their audio files. The skill operates via a built-in agent tool or through direct command-line execution for power users needing granular control over output parameters.

Installation

To integrate this capability into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/i3130002/edge-tts Once installed, ensure your environment satisfies the dependencies by navigating to the scripts directory and running npm install. The skill is designed for seamless integration with existing agent workflows.

Use Cases

This skill is highly versatile and serves several core scenarios:

Accessibility: Providing auditory feedback for users with visual impairments, allowing them to consume generated text content.
Multitasking: Enabling "hands-free" mode where users can listen to reports, summaries, or articles while driving or performing physical tasks.
Content Creation: Generating voice-overs for video content where speed and tone adjustments are required to match specific project requirements.
Dynamic UI Feedback: Enhancing the agent interaction experience by responding with voice instead of just visual text.

Example Prompts

"tts Can you read this summary of the meeting notes in a professional British male voice?"
"I need a voice-over for my project. Use the AriaNeural voice, increase the speed by 20%, and generate the output as an MP3 file."
"tts Please read this article aloud slowly so I can follow along while I practice my French; use the DeniseNeural voice."

Tips & Limitations

Voice Selection: Always run the --list-voices command periodically as Microsoft updates their neural voice library frequently.
Performance: While the service is fast, extremely long texts should be processed in chunks to ensure stability and reduce latency.
Keywords: Remember that the agent automatically filters 'tts' from the output text to ensure the audio is clean, but avoid using conversational phrases that might unintentionally trigger or block specific speech patterns.
Proxy: If you are in a region with restricted access to Microsoft services, utilize the --proxy flag to route requests through your internal network or a stable endpoint.

Read Full Documentation on GitHub

Metadata

Author@i3130002

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-i3130002-edge-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#speech#accessibility#voice

Safety Score: 4/5

Flags: network-access, file-write, file-read