voicebox-voice-synthesis
Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/adisinghstudent/voicebox-voice-synthesisWhat This Skill Does
The voicebox-voice-synthesis skill integrates the OpenClaw AI agent with the Voicebox local TTS studio. This allows users to generate high-quality, cloned, or synthetic voice audio directly on their local hardware. By leveraging a FastAPI backend running on port 17493, this skill bypasses the need for costly cloud-based APIs like ElevenLabs, providing a private, secure, and completely local pipeline for voice generation. It supports advanced features like multi-engine selection, paralinguistic tag support, and diverse language processing, making it a robust solution for developers and content creators who need to integrate human-like speech into their workflows.
Installation
To get started, first ensure the Voicebox desktop application is running on your machine (downloadable from voicebox.sh or via Docker). Once the local server is operational on localhost:17493, install the skill via the OpenClaw terminal: clawhub install openclaw/skills/skills/adisinghstudent/voicebox-voice-synthesis. The skill will automatically detect the local API, allowing your agent to start sending synthesis requests immediately without further configuration.
Use Cases
This skill is ideal for:
- Accessibility Tools: Generating real-time audio descriptions for vision-impaired users.
- Content Creation: Automating the creation of narration, voiceovers for local video projects, or audiobooks without external subscriptions.
- Agent Personas: Giving your AI agent a distinct, custom, and cloned personality to improve user engagement and immersion.
- Prototyping: Rapidly testing voice-enabled interfaces locally without incurring high API costs or data privacy risks.
Example Prompts
- "Voicebox, generate a greeting for my video using the qwen3-tts engine: 'Welcome to the future of local AI.'"
- "Using the Chatterbox Turbo engine, synthesize this text with a laughing tag: 'That is truly incredible [laugh] I never expected this result.'"
- "List all available voice profiles currently stored in my local Voicebox library so I can choose the best one for my narrator."
Tips & Limitations
To achieve the best results, ensure your hardware meets the requirements for the specific TTS engine selected. While Qwen3-TTS provides excellent general-purpose output, Chatterbox Turbo is highly recommended for emotive speech. Note that the skill relies on the availability of the local backend; if you encounter errors, verify that port 17493 is not blocked by your firewall and that the Voicebox application is active. Keep in mind that heavy concurrent synthesis might impact system performance, particularly on machines without dedicated CUDA-enabled GPUs.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-adisinghstudent-voicebox-voice-synthesis": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-read, file-write
Related Skills
Oh My Openagent Omo
Skill by adisinghstudent
Planning With Files Manus Workflow
Skill by adisinghstudent
mirofish-offline-simulation
Fully local multi-agent swarm intelligence simulation engine using Neo4j + Ollama for public opinion, market sentiment, and social dynamics prediction.
ghostling-libghostty-terminal
Build minimal terminal emulators using the libghostty-vt C API with Raylib for windowing and rendering
Obra Superpowers Agentic Workflow
Skill by adisinghstudent