edge-tts
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Why use this skill?
Integrate high-quality neural text-to-speech into OpenClaw with the edge-tts skill. Features custom voices, speed control, and subtitle support for accessible voice output.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/i3130002/edge-ttsWhat This Skill Does
The edge-tts skill provides a robust bridge between the OpenClaw agent and Microsoft's high-quality neural text-to-speech service. By leveraging the node-edge-tts package, this skill transforms plain text into natural-sounding human speech. It is designed to be highly configurable, allowing users to fine-tune audio characteristics such as voice selection, speech rate, and pitch. Beyond simple audio playback, it includes support for subtitle generation, making it an excellent choice for creators needing synchronized text data alongside their audio files. The skill operates via a built-in agent tool or through direct command-line execution for power users needing granular control over output parameters.
Installation
To integrate this capability into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/i3130002/edge-tts
Once installed, ensure your environment satisfies the dependencies by navigating to the scripts directory and running npm install. The skill is designed for seamless integration with existing agent workflows.
Use Cases
This skill is highly versatile and serves several core scenarios:
- Accessibility: Providing auditory feedback for users with visual impairments, allowing them to consume generated text content.
- Multitasking: Enabling "hands-free" mode where users can listen to reports, summaries, or articles while driving or performing physical tasks.
- Content Creation: Generating voice-overs for video content where speed and tone adjustments are required to match specific project requirements.
- Dynamic UI Feedback: Enhancing the agent interaction experience by responding with voice instead of just visual text.
Example Prompts
- "tts Can you read this summary of the meeting notes in a professional British male voice?"
- "I need a voice-over for my project. Use the AriaNeural voice, increase the speed by 20%, and generate the output as an MP3 file."
- "tts Please read this article aloud slowly so I can follow along while I practice my French; use the DeniseNeural voice."
Tips & Limitations
- Voice Selection: Always run the
--list-voicescommand periodically as Microsoft updates their neural voice library frequently. - Performance: While the service is fast, extremely long texts should be processed in chunks to ensure stability and reduce latency.
- Keywords: Remember that the agent automatically filters 'tts' from the output text to ensure the audio is clean, but avoid using conversational phrases that might unintentionally trigger or block specific speech patterns.
- Proxy: If you are in a region with restricted access to Microsoft services, utilize the
--proxyflag to route requests through your internal network or a stable endpoint.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-i3130002-edge-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read