voice-tts
语音输入(Whisper ASR)+ 语音输出(Edge TTS)技能,支持 agent 专属音色,可调用 send_voice_reply.mjs 发送 Telegram 语音消息。
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/believe3344/voice-ttsWhat This Skill Does
The voice-tts skill is a comprehensive audio processing solution for OpenClaw agents, enabling seamless voice-to-text (ASR) and text-to-voice (TTS) capabilities. It bridges the gap between natural human speech and AI processing, allowing your agent to understand voice messages sent via platforms like Telegram, Discord, or Lark, and respond with high-quality natural-sounding voice replies. By leveraging Whisper for transcription and Edge TTS for speech synthesis, this skill ensures that your agent is accessible and interactive, providing both transcribed text and audio responses for a complete multimodal communication experience.
Installation
To install this skill, use the clawhub command: clawhub install openclaw/skills/skills/believe3344/voice-tts.
After installation, follow these mandatory steps:
- Navigate to the
scripts/directory within the skill path. - Rename the files by removing the
.txtextension to make them executable. - Install necessary dependencies via pip:
pip install edge-tts whisper torch click. - Ensure
ffmpegis installed on your system (viabrew install ffmpegon macOS orsudo apt install ffmpegon Ubuntu). - Configure
openclaw.jsonas detailed in the documentation to enable audio tool integration.
Use Cases
This skill is perfect for agents operating in messaging environments where users are on the move or prefer hands-free interaction. It is ideal for:
- Virtual assistants handling customer inquiries received via voice notes.
- Personal productivity agents that provide vocal summaries and reminders.
- Multimodal bots in social communities that want to engage users through natural language interactions.
- Automated transcription services that archive voice messages for later text review.
Example Prompts
- "(User sends a 30-second voice message)" -> The agent transcribes the message using Whisper and replies with a conversational text response plus an audio file generated by Edge TTS.
- "用语音读出最新的周报摘要" (Read the latest weekly report summary using voice)
- "请把这个消息用语音回复我" (Please reply to this message using voice)
Tips & Limitations
- Model Selection: For faster performance on limited hardware, use the
tinyorbaseWhisper models. For higher accuracy,large-v3is recommended, though it requires significantly more RAM. - Voice Customization: You can customize the agent's tone by switching between available Edge TTS neural voices like
zh-CN-XiaoxiaoNeuraloren-US-JennyNeural. - FFmpeg Requirement: Do not skip the ffmpeg installation; it is the backbone of the audio processing pipeline and is required for both recording and playback formats.
- Automatic Hooks: Take advantage of the
auto_voice_checkscript for batch processing inbound voice messages to keep your agent's task queue efficient.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-believe3344-voice-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, external-api