Official Verified media Safety 4/5

lh-edge-tts

Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

Why use this skill?

Convert text to high-quality neural speech with the OpenClaw lh-edge-tts skill. Supports multiple languages, custom pitch, speed, and subtitle generation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/liuhedev/lh-edge-tts

Download Source Code (.zip)

What This Skill Does

The lh-edge-tts skill leverages the power of Microsoft Edge's high-quality neural text-to-speech engine to convert text into natural-sounding audio. It provides an interface for OpenClaw users to generate audio output programmatically. By integrating directly with the edge-tts Python library, it supports a vast library of neural voices across multiple languages, precise control over speech rates, pitch, and volume, as well as the ability to generate synchronized subtitle files (SRT or VTT). This skill bridges the gap between text-based AI processing and human-centric auditory communication.

Installation

To integrate this skill into your agent, use the OpenClaw command-line interface. Execute the following command in your terminal:

clawhub install openclaw/skills/skills/liuhedev/lh-edge-tts

Ensure that your environment has Python 3 installed and the required dependencies mentioned in the source repository are met. Once installed, the agent will recognize the 'tts' trigger and start processing requests automatically.

Use Cases

This skill is ideal for several scenarios:

Accessibility: Converting long articles, documents, or chat responses into audio for visually impaired users or those who prefer auditory learning.
Multitasking: Enabling users to consume AI-generated information while driving, cooking, or exercising without needing to look at a screen.
Content Creation: Generating voiceovers for video projects or presentations by outputting high-quality audio files alongside subtitle files.
Language Learning: Using natural-sounding neural voices to practice listening comprehension in various languages.

Example Prompts

"tts Read the latest technical documentation summary to me using the English Aria voice at a slightly slower speed."
"tts Convert this story into an audio file using the Chinese Yunyang voice and save it to my downloads folder."
"tts Please read back the summary of the meeting notes, but use a faster speed so I can review it quickly while I drive."

Tips & Limitations

Rate Tuning: Use the percentage-based syntax (e.g., +20%) to adjust speed. Avoid going over 50% as clarity may degrade.
Voice Selection: Always use the --list-voices command to see the latest available neural models, as Microsoft updates their voice library periodically.
Network Dependence: This skill requires an active internet connection to communicate with the edge-tts service endpoints; offline usage is not currently supported.
Performance: While latency is generally low, very long text inputs should be processed in segments to ensure stability.

Read Full Documentation on GitHub

Metadata

Author@liuhedev

Stars1601

Updated2026-02-27

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-liuhedev-lh-edge-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#speech#accessibility#voiceover

Safety Score: 4/5

Flags: file-write, file-read, external-api

Related Skills

baoyu-post-to-x

Posts content and articles to X (Twitter). Supports regular posts with images/videos and X Articles (long-form Markdown). Uses real Chrome with CDP to bypass anti-automation. Use when user asks to "post to X", "tweet", "publish to Twitter", or "share on X".

liuhedev 1601