ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified communication Safety 4/5

vibevoice

Local Spanish TTS using Microsoft VibeVoice. Generate natural voice audio from text, optimized for WhatsApp voice messages.

Why use this skill?

Generate natural-sounding Spanish voice audio locally with the vibevoice skill. Perfect for WhatsApp messages, featuring adjustable speed and high-quality opus encoding.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/javier887/vibevoice
Or

What This Skill Does

The vibevoice skill provides high-fidelity, local Text-to-Speech (TTS) capabilities for the OpenClaw AI agent, specifically optimized for generating Spanish audio. Powered by Microsoft's VibeVoice model, this skill converts text strings into natural-sounding voice files. It is uniquely engineered to support WhatsApp-native voice messaging, producing high-quality Opus-encoded .ogg files that simulate authentic human speech patterns. Unlike cloud-based APIs, vibevoice runs entirely offline, ensuring privacy, zero latency costs, and continuous availability even without an internet connection. It features adjustable speed settings and multiple voice profiles, allowing users to customize the output to match specific persona requirements or conversational contexts.

Installation

To integrate this skill into your local environment, ensure you have an NVIDIA GPU with approximately 2GB of dedicated VRAM, Python 3.10 or higher, and FFmpeg installed. The installation process is streamlined through the ClawHub platform. Execute the following command in your terminal:

clawhub install openclaw/skills/skills/javier887/vibevoice

Ensure that the VibeVoice repository is cloned correctly into your home directory at ~/VibeVoice. The installer will automatically configure the dependencies, including PyTorch and Torchaudio libraries required for real-time model inference.

Use Cases

This skill is ideal for personal assistants that need to maintain a human-like presence on messaging platforms. Key use cases include:

  • Automating responses to WhatsApp voice notes by replying in the user's preferred language and tone.
  • Creating accessibility features for visually impaired users by converting documentation or text summaries into audio.
  • Generating localized Spanish-language notifications or alerts that sound natural rather than robotic.
  • Enhancing productivity by having long-form text documents converted into audio for on-the-go listening.

Example Prompts

  1. "Translate this text to Spanish and send it as a voice note to my friend Juan via WhatsApp: 'I will be there in ten minutes.'"
  2. "Read the summary of this report and generate a voice file with a 1.2x speed setting."
  3. "Send an audio response to the last message from Maria using the default Spanish male voice."

Tips & Limitations

  • Performance: The model achieves an RTF of 0.24x, meaning a 60-second message generates in about 15 seconds. Expect a brief 10-second initialization delay upon the first launch.
  • Content Length: For optimal quality, limit individual text inputs to 1500 characters. Longer texts should be broken into chunks to avoid audio artifacts.
  • Audio Rules: Adhere to social etiquette by only sending voice messages when requested or when responding to existing audio threads.
  • Storage: Files are saved to temporary directories by default; ensure your system permissions allow file-write access for the script directory.

Metadata

Author@javier887
Stars1947
Views1
Updated2026-03-04
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-javier887-vibevoice": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#spanish#whatsapp#audio#local-ai
Safety Score: 4/5

Flags: file-write, file-read