What This Skill Does

The gemini-voice-assistant is a powerful bridge between OpenClaw and Google's Gemini Live API, enabling seamless, natural-sounding voice interactions. By leveraging advanced generative models, this skill allows users to transcend simple text interfaces, facilitating real-time voice-to-voice communication. When deployed, the assistant processes your vocal input, interprets the semantic intent, and returns a spoken response along with a corresponding textual transcript, providing a complete multimodal experience for power users.

Installation

To install this skill, use the OpenClaw command line utility: clawhub install openclaw/skills/skills/alimostafaradwan/gemini-voice-assistant. Once installed, ensure that your environment is properly configured. You must set the GEMINI_API_KEY environment variable, either by exporting it directly in your terminal session or by creating a .env file within the skill's specific directory. Additionally, verify that all dependencies—including google-genai, numpy, soundfile, librosa, and FFmpeg—are properly installed on your system to ensure full support for audio processing and file conversion.

Use Cases

This skill is ideal for hands-free workflow management, where dictating tasks is more efficient than typing. It is perfectly suited for developers who need to brainstorm architecture while away from the keyboard, or for accessibility-focused users who prefer auditory feedback. Furthermore, it serves as an excellent tool for language practice, providing natural, low-latency conversational AI that adapts to your spoken input. Whether you are controlling your system through voice or simply engaging in a creative dialogue, this assistant provides a low-friction entry point to high-level intelligence.

Example Prompts

"What is the current status of my project build, and can you summarize the last three errors?"
"Brainstorm five creative project names for a new AI-based financial tracking tool."
"Summarize the meeting notes I just sent and identify the top three action items for the team."

Tips & Limitations

To maximize performance, ensure a clear microphone signal when providing audio input. The skill defaults to the gemini-2.5-flash-native-audio-preview-12-2025 model for optimal latency. Note that switching to text-only models like gemini-2.0-flash-exp will disable voice output capabilities. Always verify your network connectivity, as the agent relies on real-time external API communication. Be mindful that API usage is subject to Google's rate limits and billing structures associated with your Gemini API key.

gemini-voice-assistant

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)

Related Skills

gemini-assistant