whatsapp-voice-talk
Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
Why use this skill?
Transform WhatsApp into a voice-powered assistant with OpenClaw. Transcribe audio, detect intent, and trigger custom handlers effortlessly.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/syedateebulislam/whatsapp-voice-chat-integration-open-sourceWhat This Skill Does
The whatsapp-voice-talk skill is a specialized communication pipeline designed to integrate real-time voice processing capabilities directly into WhatsApp using the OpenClaw agent ecosystem. It acts as an intermediary layer that takes incoming audio files, transcribes them using OpenAI's Whisper model, performs intent classification, triggers specific backend handlers, and returns audio responses. It natively supports English and Hindi, providing seamless conversational interfaces for bots that need to handle voice inputs instead of just text.
Installation
To get started, ensure you have Python 3.8+ installed on your system. First, install the required dependencies: pip install openai-whisper soundfile numpy. Next, you will need to install the skill via the Clawhub command: clawhub install openclaw/skills/skills/syedateebulislam/whatsapp-voice-chat-integration-open-source. Once installed, you can trigger the background processing daemon by executing node scripts/voice-listener-daemon.js. This daemon monitors the ~/.clawdbot/media/inbound/ directory for new voice clips, ensuring that your WhatsApp bot remains responsive without manual file management.
Use Cases
This skill is highly versatile and fits into various automation workflows:
- Smart Home Automation: Convert voice commands into trigger signals for IoT hardware.
- Voice-Enabled Task Management: Add items to lists or set reminders while driving or multitasking.
- Multi-lingual Customer Support: Handle customer inquiries in English or Hindi, providing instant voice responses.
- System Monitoring: Receive verbal status updates on server health or operational metrics without needing to check a dashboard.
Example Prompts
- "What is the current weather forecast for New Delhi?"
- "Add eggs, bread, and milk to my grocery list."
- "Is the production server currently online or are there issues?"
Tips & Limitations
The processing time is typically 5-10 seconds per message, depending heavily on the latency of the Whisper model loading. For production environments, it is recommended to keep the model pre-loaded in memory. The system is designed to handle common audio formats (OGG, WAV, MP3). Ensure the voice-listener-daemon has proper read/write permissions to the ~/.clawdbot/ directory to avoid processing failures. While the intent detection is highly customizable, complex conversational flows may require expanding the INTENTS map in voice-processor.js to include more precise regex or keyword matching.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-syedateebulislam-whatsapp-voice-chat-integration-open-source": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, external-api