Her Voice
Give your agent a voice. Use when the user wants the agent to speak, read aloud, or have voice responses.
Why use this skill?
Add a natural, high-performance voice to your OpenClaw agent with Her Voice. Featuring local Kokoro TTS, low-latency streaming, and phonetic configuration for a personalized experience.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/matusvojtek/her-voiceWhat This Skill Does
Her Voice is a high-performance text-to-speech (TTS) extension for the OpenClaw AI agent, designed to provide a natural, human-like voice experience. Powered by the Kokoro TTS model, it offers a compact, on-device solution that brings your agent to life without requiring external API keys or cloud subscriptions. The skill is highly optimized, utilizing MLX acceleration on Apple Silicon and PyTorch on other architectures to ensure low-latency, real-time streaming of speech. Beyond basic generation, it includes advanced features like a daemon mode to keep the model pre-warmed in RAM, a real-time visualizer for macOS, and fine-grained control over voice pitch, speed, and pronunciation. It transforms the OpenClaw agent from a simple text interface into a conversational voice assistant, perfect for users who prefer auditory feedback or need a hands-free interaction model.
Installation
Installation is streamlined through the OpenClaw package manager. First, execute the install command: clawhub install openclaw/skills/skills/matusvojtek/her-voice. Once the files are local, trigger the setup wizard by running python3 SKILL_DIR/scripts/setup.py. This script automatically identifies your hardware, installs necessary dependencies like espeak-ng, downloads the Kokoro model, and configures the environment. After installation, you can initialize personal preferences, such as setting your agent's name and defining phonetic pronunciations for specific words or user names, using the scripts/config.py tool. Ensure your terminal environment is configured correctly to allow for audio playback, especially if running on headless Linux servers.
Use Cases
Her Voice is ideal for users looking to create an empathetic, responsive AI companion. Use it for reading back long-form text, providing spoken notifications, or facilitating hands-free workflows while you are away from the keyboard. It is particularly useful for accessibility purposes, providing clear, natural-sounding audio for users with visual impairments. Developers can also leverage the ability to output to audio files (.wav) for prototyping or creating voiceovers for media projects directly from the command line.
Example Prompts
- "OpenClaw, read out loud the last email I received from the project manager."
- "Please speak faster, Jackie, and tell me a short story about the history of artificial intelligence."
- "Summarize the current weather report and read it to me using a quick, energetic voice."
Tips & Limitations
For the best audio experience, experiment with the --speed flag; slight adjustments (e.g., 1.1x) can make the voice sound more conversational. If the TTS mispronounces names, do not hesitate to use the user_name_tts setting to provide a phonetic override. Note that the real-time visualizer is exclusive to macOS, and while the model is compact, it does require a reasonable amount of RAM when the TTS daemon is active. If you notice high system resource usage, you can disable the daemon mode to reclaim memory when the agent is idle.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-matusvojtek-her-voice": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
Briefing Room
Daily news briefing generator — produces a conversational radio-host-style audio briefing + DOCX document covering weather, X/Twitter trends, web trends, world news, politics, tech, local news, sports, markets, and crypto. macOS only (uses Apple TTS and afplay). Use when user asks for a news briefing, morning briefing, daily update, or similar.
TubeScribe
YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.