ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified communication Safety 4/5

walkie-talkie

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

Why use this skill?

Enable real-time voice-to-voice conversations on WhatsApp with the OpenClaw Walkie-Talkie skill. Uses local AI for fast transcription and speech.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/rubenfb23/vocal-chat
Or

What This Skill Does

The walkie-talkie skill transforms OpenClaw into a responsive voice-first assistant specifically tailored for WhatsApp environments. By creating a closed-loop bridge between audio input and generated speech output, it enables a natural, hands-free conversation style. The skill utilizes local high-performance engines—whisper-cpp for transcription and sherpa-onnx-tts for synthesis—to ensure privacy and low latency. When triggered, the system intercepts incoming Ogg/Opus files, converts them to text for the reasoning engine, and produces an auditory response that is sent back to the user, effectively bypassing the need for typing.

Installation

To integrate this functionality into your environment, run the following command in your terminal: clawhub install openclaw/skills/skills/rubenfb23/vocal-chat Ensure that you have the required dependencies installed on your system, specifically ffmpeg, whisper-cpp, and the sherpa-onnx-tts binary, as these are critical for the skill's local processing capability.

Use Cases

This skill is ideal for users on the move who cannot safely or conveniently type on their devices. It is perfect for voice-to-voice brainstorming sessions, hands-free automation management while driving or commuting, and users who prefer spoken language for more expressive interaction. It also serves as a robust accessibility tool for those who prefer verbal communication over textual input.

Example Prompts

  1. "Activa modo walkie-talkie, ahora quiero responderte solo con mensajes de voz."
  2. "Hablemos por voz desde ahora, ¿puedes resumir los puntos clave de mi última reunión?"
  3. "Oye, apaga el modo walkie-talkie cuando terminemos esta sesión."

Tips & Limitations

To maintain the required Real-Time Factor (RTF) of less than 0.5, ensure your machine has sufficient CPU overhead to handle local inference. Because this skill relies on local file processing, avoid running heavy background tasks during high-frequency voice exchanges to prevent stuttering in the audio response. Note that while both text and audio are sent for clarity, the audio file is the primary medium for this skill. If the transcription service fails, verify that the input audio format is compatible with your local ffmpeg configuration.

Metadata

Author@rubenfb23
Stars1133
Views1
Updated2026-02-18
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-rubenfb23-vocal-chat": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#whatsapp#voice#speech-to-text#tts#assistant
Safety Score: 4/5

Flags: file-read, file-write, code-execution