What This Skill Does

The voice-assistant skill transforms your OpenClaw agent into a high-performance, real-time conversational partner. By bridging audio input from your browser directly to an OpenAI-compatible gateway, it enables a seamless voice-to-voice experience. The architecture is engineered for low latency, featuring sub-2 second time-to-first-audio performance by utilizing continuous streaming at every link in the chain: from the browser microphone to the STT processor, through the LLM logic, and back out via the TTS engine. It supports high-fidelity providers like Deepgram and ElevenLabs, allowing you to balance cost, speed, and voice realism.

Installation

To get started, first ensure your OpenClaw environment is configured. Use the following command in your terminal to integrate the skill:

clawhub install openclaw/skills/skills/charantejmandali18/voice-assistant

Navigate to the skill's base directory, copy the environment template (cp .env.example .env), and populate your specific API keys for your chosen providers. Once configured, launch the server using uv run scripts/server.py. Access the interface via your browser at http://localhost:7860 to begin interacting.

Use Cases

This skill is perfect for scenarios requiring hands-free agent interaction. Use it for voice-controlled home automation, conducting mock interviews where the agent provides instant feedback, or as a real-time brainstorming assistant that captures your spoken thoughts without the need for manual transcription. It is also highly effective for accessibility-focused workflows where typing is not the primary input method.

Example Prompts

"Hey, I'm working on a Python script for data processing. Can you walk me through the best way to optimize a loop that handles large datasets?"
"Draft a summary of our meeting notes. I want you to focus on the action items for the marketing team and the deadlines we discussed."
"Summarize the last three messages in this thread and propose a professional response that acknowledges the client's concern regarding the budget."

Tips & Limitations

For the best experience, ensure your network connection is stable, as high jitter can disrupt the WebSocket streaming. If you experience lag, try lowering your audio sample rate or switching to a faster STT provider. Note that the VAD (Voice Activity Detection) threshold can be fine-tuned via VOICE_VAD_SILENCE_MS; increase this value if the agent is cutting you off mid-sentence, or decrease it if the agent is too slow to respond to your silences.

voice-assistant

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)