ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified communication Safety 4/5

gemini-voice-assistant

Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI assistant powered by Google's Gemini models.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/alimostafaradwan/gemini-voice-assistant
Or

What This Skill Does

The gemini-voice-assistant is a powerful bridge between OpenClaw and Google's Gemini Live API, enabling seamless, natural-sounding voice interactions. By leveraging advanced generative models, this skill allows users to transcend simple text interfaces, facilitating real-time voice-to-voice communication. When deployed, the assistant processes your vocal input, interprets the semantic intent, and returns a spoken response along with a corresponding textual transcript, providing a complete multimodal experience for power users.

Installation

To install this skill, use the OpenClaw command line utility: clawhub install openclaw/skills/skills/alimostafaradwan/gemini-voice-assistant. Once installed, ensure that your environment is properly configured. You must set the GEMINI_API_KEY environment variable, either by exporting it directly in your terminal session or by creating a .env file within the skill's specific directory. Additionally, verify that all dependencies—including google-genai, numpy, soundfile, librosa, and FFmpeg—are properly installed on your system to ensure full support for audio processing and file conversion.

Use Cases

This skill is ideal for hands-free workflow management, where dictating tasks is more efficient than typing. It is perfectly suited for developers who need to brainstorm architecture while away from the keyboard, or for accessibility-focused users who prefer auditory feedback. Furthermore, it serves as an excellent tool for language practice, providing natural, low-latency conversational AI that adapts to your spoken input. Whether you are controlling your system through voice or simply engaging in a creative dialogue, this assistant provides a low-friction entry point to high-level intelligence.

Example Prompts

  1. "What is the current status of my project build, and can you summarize the last three errors?"
  2. "Brainstorm five creative project names for a new AI-based financial tracking tool."
  3. "Summarize the meeting notes I just sent and identify the top three action items for the team."

Tips & Limitations

To maximize performance, ensure a clear microphone signal when providing audio input. The skill defaults to the gemini-2.5-flash-native-audio-preview-12-2025 model for optimal latency. Note that switching to text-only models like gemini-2.0-flash-exp will disable voice output capabilities. Always verify your network connectivity, as the agent relies on real-time external API communication. Be mindful that API usage is subject to Google's rate limits and billing structures associated with your Gemini API key.

Metadata

Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-alimostafaradwan-gemini-voice-assistant": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#voice#gemini#assistant#multimodal
Safety Score: 4/5

Flags: network-access, file-write, file-read, external-api