Official Verified media Safety 4/5

voicebox-voice-synthesis

Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/adisinghstudent/voicebox-voice-synthesis

Download Source Code (.zip)

What This Skill Does

The voicebox-voice-synthesis skill integrates the OpenClaw AI agent with the Voicebox local TTS studio. This allows users to generate high-quality, cloned, or synthetic voice audio directly on their local hardware. By leveraging a FastAPI backend running on port 17493, this skill bypasses the need for costly cloud-based APIs like ElevenLabs, providing a private, secure, and completely local pipeline for voice generation. It supports advanced features like multi-engine selection, paralinguistic tag support, and diverse language processing, making it a robust solution for developers and content creators who need to integrate human-like speech into their workflows.

Installation

To get started, first ensure the Voicebox desktop application is running on your machine (downloadable from voicebox.sh or via Docker). Once the local server is operational on localhost:17493, install the skill via the OpenClaw terminal: clawhub install openclaw/skills/skills/adisinghstudent/voicebox-voice-synthesis. The skill will automatically detect the local API, allowing your agent to start sending synthesis requests immediately without further configuration.

Use Cases

This skill is ideal for:

Accessibility Tools: Generating real-time audio descriptions for vision-impaired users.
Content Creation: Automating the creation of narration, voiceovers for local video projects, or audiobooks without external subscriptions.
Agent Personas: Giving your AI agent a distinct, custom, and cloned personality to improve user engagement and immersion.
Prototyping: Rapidly testing voice-enabled interfaces locally without incurring high API costs or data privacy risks.

Example Prompts

"Voicebox, generate a greeting for my video using the qwen3-tts engine: 'Welcome to the future of local AI.'"
"Using the Chatterbox Turbo engine, synthesize this text with a laughing tag: 'That is truly incredible [laugh] I never expected this result.'"
"List all available voice profiles currently stored in my local Voicebox library so I can choose the best one for my narrator."

Tips & Limitations

To achieve the best results, ensure your hardware meets the requirements for the specific TTS engine selected. While Qwen3-TTS provides excellent general-purpose output, Chatterbox Turbo is highly recommended for emotive speech. Note that the skill relies on the availability of the local backend; if you encounter errors, verify that port 17493 is not blocked by your firewall and that the Voicebox application is active. Keep in mind that heavy concurrent synthesis might impact system performance, particularly on machines without dedicated CUDA-enabled GPUs.

Read Full Documentation on GitHub

Metadata

Author@adisinghstudent

Stars3809

Updated2026-04-05

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-adisinghstudent-voicebox-voice-synthesis": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice-cloning#local-ai#audio#synthesis

Safety Score: 4/5

Flags: network-access, file-read, file-write

Related Skills

Oh My Openagent Omo

Skill by adisinghstudent

adisinghstudent 3809

Planning With Files Manus Workflow

Skill by adisinghstudent

adisinghstudent 3809

mirofish-offline-simulation

Fully local multi-agent swarm intelligence simulation engine using Neo4j + Ollama for public opinion, market sentiment, and social dynamics prediction.

adisinghstudent 3809

ghostling-libghostty-terminal

Build minimal terminal emulators using the libghostty-vt C API with Raylib for windowing and rendering

adisinghstudent 3809

Obra Superpowers Agentic Workflow

Skill by adisinghstudent

adisinghstudent 3809