jarvis-tts
Jarvis TTS text-to-speech using Microsoft edge-tts with afplay playback. Use when users request voice output, audio responses, or text-to-speech. Provides natural-sounding Chinese TTS.
Why use this skill?
Enhance your OpenClaw agent with Jarvis TTS. Generate natural, human-like Chinese speech using Microsoft Neural voices directly on your macOS system.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/e421083458/jarvis-ttsWhat This Skill Does
The jarvis-tts skill acts as a high-quality text-to-speech bridge for the OpenClaw agent, leveraging the power of Microsoft's Neural TTS engine to convert plain text into natural, expressive audio. By integrating the Microsoft edge-tts library with the native macOS afplay command, this skill ensures that generated responses are played back with high fidelity and zero latency issues. It is designed to be a robust solution for users who prefer auditory interactions, providing a human-like voice experience that outperforms traditional robotic TTS solutions. The system handles the entire lifecycle of an audio message—from the synthesis of MP3 files using API calls to the final playback and subsequent cleanup of temporary cache files—ensuring your system remains organized while maintaining a responsive communication loop.
Installation
To install this skill, use the ClawHub command within your terminal or OpenClaw environment:
clawhub install openclaw/skills/skills/e421083458/jarvis-tts
Ensure you have Python 3 installed on your machine. You will also need to install the necessary dependency by running pip3 install edge-tts. The shell wrapper provided in the skill package requires macOS to function natively, as it relies on the system-integrated afplay utility for audio output.
Use Cases
- AI Voice Interaction: Perfect for when you want your AI assistant to read its output out loud, enabling a hands-free experience.
- Automated Notifications: Use the skill to provide spoken alerts or reminders, such as "Your task list is complete" or "The timer has expired."
- Content Narration: Transform documents or long-form chat logs into spoken word, effectively creating an on-the-fly audiobook experience for research or accessibility needs.
- Accessibility: Assists users who prefer auditory input for verifying data or reading feedback during complex task execution.
Example Prompts
- "Jarvis, read this summary out loud for me."
- "Use your voice to tell me the weather report for today."
- "Summarize the current progress of the project and speak it using the Yunxi voice."
Tips & Limitations
- Voice Selection: Experiment with the various available voices. Use
zh-CN-YunxiNeuralfor a friendly, natural tone, or switch tozh-CN-YunyangNeuralif you require a professional, news-anchor style delivery. - Network Dependency: Since the skill fetches speech synthesis from Microsoft's edge servers, you must have an active internet connection for the TTS engine to generate audio.
- Platform Constraint: Currently, this skill is optimized for macOS. If you are operating on Linux or Windows, the underlying script requires modifications to the audio playback command (using
aplayorPowerShellrespectively) to function correctly. - Cleanup: The skill manages temporary files automatically; however, if a process is interrupted, check your
/tmp/directory to ensure no orphaned MP3 files remain.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-e421083458-jarvis-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api