Official Verified media Safety 4/5

voice-tts

语音输入（Whisper ASR）+ 语音输出（Edge TTS）技能，支持 agent 专属音色，可调用 send_voice_reply.mjs 发送 Telegram 语音消息。

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/believe3344/voice-tts

Download Source Code (.zip)

What This Skill Does

The voice-tts skill is a comprehensive audio processing solution for OpenClaw agents, enabling seamless voice-to-text (ASR) and text-to-voice (TTS) capabilities. It bridges the gap between natural human speech and AI processing, allowing your agent to understand voice messages sent via platforms like Telegram, Discord, or Lark, and respond with high-quality natural-sounding voice replies. By leveraging Whisper for transcription and Edge TTS for speech synthesis, this skill ensures that your agent is accessible and interactive, providing both transcribed text and audio responses for a complete multimodal communication experience.

Installation

To install this skill, use the clawhub command: clawhub install openclaw/skills/skills/believe3344/voice-tts.

After installation, follow these mandatory steps:

Navigate to the scripts/ directory within the skill path.
Rename the files by removing the .txt extension to make them executable.
Install necessary dependencies via pip: pip install edge-tts whisper torch click.
Ensure ffmpeg is installed on your system (via brew install ffmpeg on macOS or sudo apt install ffmpeg on Ubuntu).
Configure openclaw.json as detailed in the documentation to enable audio tool integration.

Use Cases

This skill is perfect for agents operating in messaging environments where users are on the move or prefer hands-free interaction. It is ideal for:

Virtual assistants handling customer inquiries received via voice notes.
Personal productivity agents that provide vocal summaries and reminders.
Multimodal bots in social communities that want to engage users through natural language interactions.
Automated transcription services that archive voice messages for later text review.

Example Prompts

"(User sends a 30-second voice message)" -> The agent transcribes the message using Whisper and replies with a conversational text response plus an audio file generated by Edge TTS.
"用语音读出最新的周报摘要" (Read the latest weekly report summary using voice)
"请把这个消息用语音回复我" (Please reply to this message using voice)

Tips & Limitations

Model Selection: For faster performance on limited hardware, use the tiny or base Whisper models. For higher accuracy, large-v3 is recommended, though it requires significantly more RAM.
Voice Customization: You can customize the agent's tone by switching between available Edge TTS neural voices like zh-CN-XiaoxiaoNeural or en-US-JennyNeural.
FFmpeg Requirement: Do not skip the ffmpeg installation; it is the backbone of the audio processing pipeline and is required for both recording and playback formats.
Automatic Hooks: Take advantage of the auto_voice_check script for batch processing inbound voice messages to keep your agent's task queue efficient.

Read Full Documentation on GitHub

Metadata

Author@believe3344

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-believe3344-voice-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#voice#tts#asr#audio#multimodal

Safety Score: 4/5

Flags: file-read, file-write, external-api

Related Skills

xiaoai-ha-control

通过 Home Assistant + Xiaomi Miot 控制小爱音箱，并可选支持“小爱语音 → OpenClaw”的桥接。适用于两类场景：1) 用户要求“让小爱说一句… / 播报… / 通知…”、“告诉小爱… / 让小爱执行…”、“让小爱播放音频 / mp3 / 链接”时，使用本 skill 进行下行控制；2) 已接入小爱语音桥时，处理带有 `【来自小爱】` / `【来自小爱语音】` 标识的上行消息。只要任务涉及小爱音箱控制、通过小爱执行命令、通过小爱播报结果，或小爱来源消息的桥接与分流，就应使用此 skill。

believe3344 4473