walkie-talkie
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Why use this skill?
Enable real-time voice-to-voice conversations on WhatsApp with the OpenClaw Walkie-Talkie skill. Uses local AI for fast transcription and speech.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/rubenfb23/vocal-chatWhat This Skill Does
The walkie-talkie skill transforms OpenClaw into a responsive voice-first assistant specifically tailored for WhatsApp environments. By creating a closed-loop bridge between audio input and generated speech output, it enables a natural, hands-free conversation style. The skill utilizes local high-performance engines—whisper-cpp for transcription and sherpa-onnx-tts for synthesis—to ensure privacy and low latency. When triggered, the system intercepts incoming Ogg/Opus files, converts them to text for the reasoning engine, and produces an auditory response that is sent back to the user, effectively bypassing the need for typing.
Installation
To integrate this functionality into your environment, run the following command in your terminal:
clawhub install openclaw/skills/skills/rubenfb23/vocal-chat
Ensure that you have the required dependencies installed on your system, specifically ffmpeg, whisper-cpp, and the sherpa-onnx-tts binary, as these are critical for the skill's local processing capability.
Use Cases
This skill is ideal for users on the move who cannot safely or conveniently type on their devices. It is perfect for voice-to-voice brainstorming sessions, hands-free automation management while driving or commuting, and users who prefer spoken language for more expressive interaction. It also serves as a robust accessibility tool for those who prefer verbal communication over textual input.
Example Prompts
- "Activa modo walkie-talkie, ahora quiero responderte solo con mensajes de voz."
- "Hablemos por voz desde ahora, ¿puedes resumir los puntos clave de mi última reunión?"
- "Oye, apaga el modo walkie-talkie cuando terminemos esta sesión."
Tips & Limitations
To maintain the required Real-Time Factor (RTF) of less than 0.5, ensure your machine has sufficient CPU overhead to handle local inference. Because this skill relies on local file processing, avoid running heavy background tasks during high-frequency voice exchanges to prevent stuttering in the audio response. Note that while both text and audio are sent for clarity, the audio file is the primary medium for this skill. If the transcription service fails, verify that the input audio format is compatible with your local ffmpeg configuration.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-rubenfb23-vocal-chat": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution
Related Skills
arxiv-watcher
Search and summarize papers from ArXiv. Use when the user asks for the latest research, specific topics on ArXiv, or a daily summary of AI papers.
whatsapp-styler
Skill to ensure all messages sent to WhatsApp follow the platform's specific formatting syntax. It prevents markdown bloat and ensures a clean, mobile-first reading experience.
walkie-talkie
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
whatsapp-styler
Skill to ensure all messages sent to WhatsApp follow the platform's specific formatting syntax. It prevents markdown bloat and ensures a clean, mobile-first reading experience.
whatsapp-styler
Skill to ensure all messages sent to WhatsApp follow the platform's specific formatting syntax. It prevents markdown bloat and ensures a clean, mobile-first reading experience.