gemini-voice-assistant
Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI assistant powered by Google's Gemini models.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/alimostafaradwan/gemini-voice-assistantWhat This Skill Does
The gemini-voice-assistant is a powerful bridge between OpenClaw and Google's Gemini Live API, enabling seamless, natural-sounding voice interactions. By leveraging advanced generative models, this skill allows users to transcend simple text interfaces, facilitating real-time voice-to-voice communication. When deployed, the assistant processes your vocal input, interprets the semantic intent, and returns a spoken response along with a corresponding textual transcript, providing a complete multimodal experience for power users.
Installation
To install this skill, use the OpenClaw command line utility: clawhub install openclaw/skills/skills/alimostafaradwan/gemini-voice-assistant. Once installed, ensure that your environment is properly configured. You must set the GEMINI_API_KEY environment variable, either by exporting it directly in your terminal session or by creating a .env file within the skill's specific directory. Additionally, verify that all dependencies—including google-genai, numpy, soundfile, librosa, and FFmpeg—are properly installed on your system to ensure full support for audio processing and file conversion.
Use Cases
This skill is ideal for hands-free workflow management, where dictating tasks is more efficient than typing. It is perfectly suited for developers who need to brainstorm architecture while away from the keyboard, or for accessibility-focused users who prefer auditory feedback. Furthermore, it serves as an excellent tool for language practice, providing natural, low-latency conversational AI that adapts to your spoken input. Whether you are controlling your system through voice or simply engaging in a creative dialogue, this assistant provides a low-friction entry point to high-level intelligence.
Example Prompts
- "What is the current status of my project build, and can you summarize the last three errors?"
- "Brainstorm five creative project names for a new AI-based financial tracking tool."
- "Summarize the meeting notes I just sent and identify the top three action items for the team."
Tips & Limitations
To maximize performance, ensure a clear microphone signal when providing audio input. The skill defaults to the gemini-2.5-flash-native-audio-preview-12-2025 model for optimal latency. Note that switching to text-only models like gemini-2.0-flash-exp will disable voice output capabilities. Always verify your network connectivity, as the agent relies on real-time external API communication. Be mindful that API usage is subject to Google's rate limits and billing structures associated with your Gemini API key.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-alimostafaradwan-gemini-voice-assistant": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api