Official Verified communication Safety 4/5

whatsapp-voice-talk

Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.

Why use this skill?

Transform WhatsApp into a voice-powered assistant with OpenClaw. Transcribe audio, detect intent, and trigger custom handlers effortlessly.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/syedateebulislam/whatsapp-voice-chat-integration-open-source

Download Source Code (.zip)

What This Skill Does

The whatsapp-voice-talk skill is a specialized communication pipeline designed to integrate real-time voice processing capabilities directly into WhatsApp using the OpenClaw agent ecosystem. It acts as an intermediary layer that takes incoming audio files, transcribes them using OpenAI's Whisper model, performs intent classification, triggers specific backend handlers, and returns audio responses. It natively supports English and Hindi, providing seamless conversational interfaces for bots that need to handle voice inputs instead of just text.

Installation

To get started, ensure you have Python 3.8+ installed on your system. First, install the required dependencies: pip install openai-whisper soundfile numpy. Next, you will need to install the skill via the Clawhub command: clawhub install openclaw/skills/skills/syedateebulislam/whatsapp-voice-chat-integration-open-source. Once installed, you can trigger the background processing daemon by executing node scripts/voice-listener-daemon.js. This daemon monitors the ~/.clawdbot/media/inbound/ directory for new voice clips, ensuring that your WhatsApp bot remains responsive without manual file management.

Use Cases

This skill is highly versatile and fits into various automation workflows:

Smart Home Automation: Convert voice commands into trigger signals for IoT hardware.
Voice-Enabled Task Management: Add items to lists or set reminders while driving or multitasking.
Multi-lingual Customer Support: Handle customer inquiries in English or Hindi, providing instant voice responses.
System Monitoring: Receive verbal status updates on server health or operational metrics without needing to check a dashboard.

Example Prompts

"What is the current weather forecast for New Delhi?"
"Add eggs, bread, and milk to my grocery list."
"Is the production server currently online or are there issues?"

Tips & Limitations

The processing time is typically 5-10 seconds per message, depending heavily on the latency of the Whisper model loading. For production environments, it is recommended to keep the model pre-loaded in memory. The system is designed to handle common audio formats (OGG, WAV, MP3). Ensure the voice-listener-daemon has proper read/write permissions to the ~/.clawdbot/ directory to avoid processing failures. While the intent detection is highly customizable, complex conversational flows may require expanding the INTENTS map in voice-processor.js to include more precise regex or keyword matching.

Read Full Documentation on GitHub

Metadata

Author@syedateebulislam

Stars982

Updated2026-02-14

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-syedateebulislam-whatsapp-voice-chat-integration-open-source": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#whatsapp#voice-processing#whisper#chatbot#automation

Safety Score: 4/5

Flags: file-read, file-write, external-api

Related Skills

remember-all-prompts-daily

Preserve conversation continuity across token compaction cycles by extracting and archiving all prompts with date-wise entries. Automatically triggers at 95% token usage (pre-compaction) and 1% (new sprint start) to export session history, then ingests archived summaries on session restart to restore context.

syedateebulislam 982