voice-assistant
Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or ElevenLabs). Sub-2s time-to-first-audio with full streaming at every stage.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/charantejmandali18/voice-assistantWhat This Skill Does
The voice-assistant skill transforms your OpenClaw agent into a high-performance, real-time conversational partner. By bridging audio input from your browser directly to an OpenAI-compatible gateway, it enables a seamless voice-to-voice experience. The architecture is engineered for low latency, featuring sub-2 second time-to-first-audio performance by utilizing continuous streaming at every link in the chain: from the browser microphone to the STT processor, through the LLM logic, and back out via the TTS engine. It supports high-fidelity providers like Deepgram and ElevenLabs, allowing you to balance cost, speed, and voice realism.
Installation
To get started, first ensure your OpenClaw environment is configured. Use the following command in your terminal to integrate the skill:
clawhub install openclaw/skills/skills/charantejmandali18/voice-assistant
Navigate to the skill's base directory, copy the environment template (cp .env.example .env), and populate your specific API keys for your chosen providers. Once configured, launch the server using uv run scripts/server.py. Access the interface via your browser at http://localhost:7860 to begin interacting.
Use Cases
This skill is perfect for scenarios requiring hands-free agent interaction. Use it for voice-controlled home automation, conducting mock interviews where the agent provides instant feedback, or as a real-time brainstorming assistant that captures your spoken thoughts without the need for manual transcription. It is also highly effective for accessibility-focused workflows where typing is not the primary input method.
Example Prompts
- "Hey, I'm working on a Python script for data processing. Can you walk me through the best way to optimize a loop that handles large datasets?"
- "Draft a summary of our meeting notes. I want you to focus on the action items for the marketing team and the deadlines we discussed."
- "Summarize the last three messages in this thread and propose a professional response that acknowledges the client's concern regarding the budget."
Tips & Limitations
For the best experience, ensure your network connection is stable, as high jitter can disrupt the WebSocket streaming. If you experience lag, try lowering your audio sample rate or switching to a faster STT provider. Note that the VAD (Voice Activity Detection) threshold can be fine-tuned via VOICE_VAD_SILENCE_MS; increase this value if the agent is cutting you off mid-sentence, or decrease it if the agent is too slow to respond to your silences.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-charantejmandali18-voice-assistant": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api