imessage-voice-reply
Send voice message replies in iMessage using local Kokoro-ONNX TTS. Generates native iMessage voice bubbles (CAF/Opus) that play inline with waveform — not file attachments. Use when receiving a voice message in iMessage and wanting to reply with voice, enabling voice-to-voice iMessage conversations, or sending audio responses. Zero cost — all TTS runs locally. Requires BlueBubbles channel configured in OpenClaw.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bolander72/imessage-voice-replyWhat This Skill Does
The imessage-voice-reply skill allows your OpenClaw agent to generate and send native, high-fidelity voice messages within iMessage. Unlike standard file attachments, these messages render exactly like those recorded in the native Messages.app, complete with the interactive waveform playback. By leveraging local Kokoro-ONNX models for text-to-speech (TTS) and macOS native 'afconvert' tools, the skill provides high-quality, low-latency audio synthesis without external cloud dependencies or API costs. This ensures your agent can participate in voice-to-voice conversations naturally, providing a human-like touch to automated responses.
Installation
Installation is streamlined through the OpenClaw ecosystem. First, ensure you have the BlueBubbles channel configured in your agent settings. Run the setup command: bash ${baseDir}/scripts/setup.sh. This script handles the installation of critical dependencies including kokoro-onnx, soundfile, and numpy. It also manages the automatic download of the necessary voice models (approx. 136MB) to your local cache. Once the setup completes, the tool is ready to be invoked by your agent's orchestration layer.
Use Cases
This skill is ideal for scenarios requiring a more personal or accessible interaction. Use it for voice-for-voice reply cycles when the recipient has initiated a voice note, or when providing audio-first responses in a conversational flow. It is particularly effective for users who prefer listening to messages rather than reading, or when tone and inflection are necessary to convey information clearly. Always consider including a brief text summary alongside the voice message for maximum accessibility and readability.
Example Prompts
- "The user just sent me a voice note asking for a recap of today's meeting; generate a polite voice reply acknowledging their message and provide the summary."
- "I need to respond to this iMessage voice thread. Please generate an audio response using the af_heart voice, confirming that I'll be home in ten minutes."
- "Send a friendly voice acknowledgement to this contact. Make sure to use the native iMessage bubble format so they can listen to it immediately."
Tips & Limitations
To ensure your messages render as native bubbles, you must strictly follow the output format requirements: the filename must be set to 'Audio Message.caf', the content type must be 'audio/x-caf', and the 'asVoice' parameter in the BlueBubbles tool must be set to true. While the skill supports multiple languages and voices like 'af_heart' or 'ef_dora', note that some languages have limited voice options. Keep your response texts concise to ensure optimal synthesis performance and to prevent long audio files which may feel cumbersome in a chat interface. The current implementation relies on local processing, meaning performance depends on your machine's CPU capabilities.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bolander72-imessage-voice-reply": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
ratgdo32-disco
Control a ratgdo32 disco garage door opener via its local web API. Use when the user asks to open/close the garage, check garage status, toggle the garage light, check if a car is parked, enable/disable remotes, or anything involving the garage door. Supports door control, light, obstruction detection, vehicle presence (laser sensor), parking assist, motion, and remote lockout. Uses local network trust model (LAN-only, no internet exposure).
ai-voice-chat
Hands-free AI voice conversations via AirPods or any Bluetooth headset. MLX-Whisper STT (Apple Silicon GPU, ~130ms) + hybrid LLM routing (local gemma3 for simple chat, cloud for complex) + Kokoro-ONNX TTS with sentence streaming. Auto-starts on headset connect, supports mid-conversation language switching. Simple conversations run fully local and free (~2.4s total latency). Complex queries route to cloud (~5s). Zero cost for voice processing — only cloud LLM API tokens for complex queries.