walkie-talkie
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Why use this skill?
Enable voice-to-voice conversations on WhatsApp with the OpenClaw walkie-talkie skill. Experience local transcription and TTS for a seamless hands-free AI experience.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/rubenfb23/walkie-talkieWhat This Skill Does
The walkie-talkie skill transforms your OpenClaw agent into a voice-responsive assistant specifically for WhatsApp. It creates a seamless voice-to-voice communication loop by integrating local transcription and text-to-speech (TTS) engines. When enabled, incoming voice notes from WhatsApp are automatically processed through whisper-cpp to retrieve their semantic meaning, and the agent's generated responses are synthesized into high-quality audio files using sherpa-onnx-tts. This ensures that users can interact with the agent entirely through natural speech, mimicking the experience of a real-time conversation.
Installation
To integrate this capability into your OpenClaw instance, run the following command in your terminal:
clawhub install openclaw/skills/skills/rubenfb23/walkie-talkie
Ensure you have the required local dependencies installed, specifically ffmpeg, whisper-cpp, and sherpa-onnx-tts, as the skill relies on these binaries for local execution.
Use Cases
- Hands-free assistance: Ideal for users who are driving, cooking, or otherwise occupied and cannot type messages.
- Accessibility: Provides an intuitive way for visually impaired users to interact with the OpenClaw agent effectively.
- Contextual communication: Useful when a user prefers the nuance and tone of spoken language over text for complex instructions.
- Rapid dialogue: Efficiently handle quick status updates or brief inquiries where voice notes are more efficient than typing.
Example Prompts
- "Activa modo walkie-talkie, necesito que me ayudes con la lista de la compra mientras conduzco."
- "Hablemos por voz a partir de ahora, prefiero no escribir mensajes."
- [User sends a voice note asking: "¿Cómo está el clima hoy en Madrid?"]
Tips & Limitations
- Performance: The skill is optimized for low latency with an RTF (Real-Time Factor) below 0.5. To maintain this, ensure your hardware meets the requirements for local whisper-cpp inference.
- Hybrid Output: The skill is configured to send both text and audio. Text provides a safety net for clarity and accessibility, while audio provides the conversational experience.
- Privacy: By using local tools, all audio processing remains on-device, enhancing privacy compared to cloud-based alternatives.
- Constraints: Currently, only the opus/ogg format for WhatsApp voice notes is supported natively.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-rubenfb23-walkie-talkie": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution
Related Skills
arxiv-watcher
Search and summarize papers from ArXiv. Use when the user asks for the latest research, specific topics on ArXiv, or a daily summary of AI papers.
whatsapp-styler
Skill to ensure all messages sent to WhatsApp follow the platform's specific formatting syntax. It prevents markdown bloat and ensures a clean, mobile-first reading experience.
walkie-talkie
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
whatsapp-styler
Skill to ensure all messages sent to WhatsApp follow the platform's specific formatting syntax. It prevents markdown bloat and ensures a clean, mobile-first reading experience.
whatsapp-styler
Skill to ensure all messages sent to WhatsApp follow the platform's specific formatting syntax. It prevents markdown bloat and ensures a clean, mobile-first reading experience.