ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

voicebox-voice-synthesis

Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/adisinghstudent/voicebox-voice-synthesis
Or

What This Skill Does

The voicebox-voice-synthesis skill integrates the OpenClaw AI agent with the Voicebox local TTS studio. This allows users to generate high-quality, cloned, or synthetic voice audio directly on their local hardware. By leveraging a FastAPI backend running on port 17493, this skill bypasses the need for costly cloud-based APIs like ElevenLabs, providing a private, secure, and completely local pipeline for voice generation. It supports advanced features like multi-engine selection, paralinguistic tag support, and diverse language processing, making it a robust solution for developers and content creators who need to integrate human-like speech into their workflows.

Installation

To get started, first ensure the Voicebox desktop application is running on your machine (downloadable from voicebox.sh or via Docker). Once the local server is operational on localhost:17493, install the skill via the OpenClaw terminal: clawhub install openclaw/skills/skills/adisinghstudent/voicebox-voice-synthesis. The skill will automatically detect the local API, allowing your agent to start sending synthesis requests immediately without further configuration.

Use Cases

This skill is ideal for:

  • Accessibility Tools: Generating real-time audio descriptions for vision-impaired users.
  • Content Creation: Automating the creation of narration, voiceovers for local video projects, or audiobooks without external subscriptions.
  • Agent Personas: Giving your AI agent a distinct, custom, and cloned personality to improve user engagement and immersion.
  • Prototyping: Rapidly testing voice-enabled interfaces locally without incurring high API costs or data privacy risks.

Example Prompts

  • "Voicebox, generate a greeting for my video using the qwen3-tts engine: 'Welcome to the future of local AI.'"
  • "Using the Chatterbox Turbo engine, synthesize this text with a laughing tag: 'That is truly incredible [laugh] I never expected this result.'"
  • "List all available voice profiles currently stored in my local Voicebox library so I can choose the best one for my narrator."

Tips & Limitations

To achieve the best results, ensure your hardware meets the requirements for the specific TTS engine selected. While Qwen3-TTS provides excellent general-purpose output, Chatterbox Turbo is highly recommended for emotive speech. Note that the skill relies on the availability of the local backend; if you encounter errors, verify that port 17493 is not blocked by your firewall and that the Voicebox application is active. Keep in mind that heavy concurrent synthesis might impact system performance, particularly on machines without dedicated CUDA-enabled GPUs.

Metadata

Stars3809
Views2
Updated2026-04-05
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-adisinghstudent-voicebox-voice-synthesis": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice-cloning#local-ai#audio#synthesis
Safety Score: 4/5

Flags: network-access, file-read, file-write