ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified communication Safety 4/5

voice-tts

使用 edge-tts 生成高质量中文语音消息并发送。当用户要求发语音、语音回复、TTS、文字转语音、语音播报、语音消息时使用。支持多种中文声音(男声/女声/方言),可调节语速音调,适用于飞书/Telegram/Discord 等渠道的语音消息发送。

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/binbin1213/ms-voice-tts
Or

What This Skill Does

The voice-tts skill integrates Microsoft's edge-tts engine into the OpenClaw ecosystem, enabling the agent to synthesize high-quality, human-like Chinese speech from text. This skill transforms the agent from a purely text-based interface into a multimodal assistant capable of delivering audio responses. Whether you are using Feishu, Telegram, or Discord, this skill processes text input to generate opus-format audio files, which are then transmitted as native voice messages. It supports a diverse range of voices, including various gender-specific neural voices and regional accents like Northeast Mandarin and Taiwanese Mandarin, providing a professional and localized communication experience.

Installation

To use this skill, ensure you have the edge-tts Python library installed. We strongly recommend using pipx to manage the installation to keep your system Python environment clean and avoid dependency conflicts.

  1. Install via pipx install edge-tts (or pip install --user edge-tts for Linux systems where pipx is unavailable).
  2. Verify the installation by running edge-tts --list-voices in your terminal.
  3. Ensure the script directory is accessible by the OpenClaw agent to allow for the automated triggering of tts.sh.

Use Cases

  • Proactive Notifications: Automatically send voice reminders for meetings or project deadlines in team chat channels like Feishu.
  • Customer Service Automation: Provide personalized, warm human-sounding voice responses in Telegram or Discord customer support threads.
  • Accessibility Enhancements: Convert long-form text reports into audio briefs for users who prefer listening to content while on the move.
  • Regional Localization: Use specific dialect voices (e.g., zh-TW-HsiaoChenNeural) to better engage with specific demographic user bases.

Example Prompts

  1. "Send a voice message to the team channel saying 'The project review meeting will start in 5 minutes, please be prepared.'"
  2. "Reply to this user on Telegram with an audio message: 'Thank you for your feedback, we have received your request and will address it shortly.'"
  3. "Use the Xiaoxiao voice to announce this text in Discord: 'Welcome everyone, today's focus is on optimizing our AI agent workflows.'"

Tips & Limitations

  • Optimization: Keep your input text between 50-300 characters for the best balance of generation speed and audio quality.
  • File Management: Note that the skill generates files in ~/.openclaw/media/. Regularly clean up these directories to prevent disk bloat if you generate high volumes of audio.
  • Network Dependency: While edge-tts is locally executed, it requires an active internet connection to communicate with Microsoft's speech services during the generation phase.
  • Latency: Generation typically takes 1-3 seconds. Plan your application flow to account for this short processing delay before sending the message.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-binbin1213-ms-voice-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice-synthesis#automation#audio#multimodal
Safety Score: 4/5

Flags: file-write, file-read, external-api