ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified communication Safety 4/5

walkie-talkie

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

Why use this skill?

Enable voice-to-voice conversations on WhatsApp with the OpenClaw walkie-talkie skill. Experience local transcription and TTS for a seamless hands-free AI experience.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/rubenfb23/walkie-talkie
Or

What This Skill Does

The walkie-talkie skill transforms your OpenClaw agent into a voice-responsive assistant specifically for WhatsApp. It creates a seamless voice-to-voice communication loop by integrating local transcription and text-to-speech (TTS) engines. When enabled, incoming voice notes from WhatsApp are automatically processed through whisper-cpp to retrieve their semantic meaning, and the agent's generated responses are synthesized into high-quality audio files using sherpa-onnx-tts. This ensures that users can interact with the agent entirely through natural speech, mimicking the experience of a real-time conversation.

Installation

To integrate this capability into your OpenClaw instance, run the following command in your terminal: clawhub install openclaw/skills/skills/rubenfb23/walkie-talkie Ensure you have the required local dependencies installed, specifically ffmpeg, whisper-cpp, and sherpa-onnx-tts, as the skill relies on these binaries for local execution.

Use Cases

  • Hands-free assistance: Ideal for users who are driving, cooking, or otherwise occupied and cannot type messages.
  • Accessibility: Provides an intuitive way for visually impaired users to interact with the OpenClaw agent effectively.
  • Contextual communication: Useful when a user prefers the nuance and tone of spoken language over text for complex instructions.
  • Rapid dialogue: Efficiently handle quick status updates or brief inquiries where voice notes are more efficient than typing.

Example Prompts

  1. "Activa modo walkie-talkie, necesito que me ayudes con la lista de la compra mientras conduzco."
  2. "Hablemos por voz a partir de ahora, prefiero no escribir mensajes."
  3. [User sends a voice note asking: "¿Cómo está el clima hoy en Madrid?"]

Tips & Limitations

  • Performance: The skill is optimized for low latency with an RTF (Real-Time Factor) below 0.5. To maintain this, ensure your hardware meets the requirements for local whisper-cpp inference.
  • Hybrid Output: The skill is configured to send both text and audio. Text provides a safety net for clarity and accessibility, while audio provides the conversational experience.
  • Privacy: By using local tools, all audio processing remains on-device, enhancing privacy compared to cloud-based alternatives.
  • Constraints: Currently, only the opus/ogg format for WhatsApp voice notes is supported natively.

Metadata

Author@rubenfb23
Stars1133
Views1
Updated2026-02-18
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-rubenfb23-walkie-talkie": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#whatsapp#voice#speech#automation#assistant
Safety Score: 4/5

Flags: file-read, file-write, code-execution