What This Skill Does

The chichi-speech skill is a specialized FastAPI-based REST service designed to deliver high-quality text-to-speech (TTS) synthesis powered by the Qwen3 model. Unlike generic TTS engines, this service is specifically optimized for voice cloning using a pre-computed reference audio prompt. By anchoring the voice generation to a specific audio sample and its associated text, the service ensures that generated speech is not only high-fidelity but also consistent in timbre, prosody, and emotional tone across multiple requests. It is packaged as an installable CLI, making it easy to deploy within your OpenClaw environment.

Installation

To get started, ensure you have Python 3.10 or higher installed on your system. From your terminal, run the following command to install the skill:

clawhub install openclaw/skills/skills/hudeven/chichi-speech

Once installed, you can initialize the server using the provided CLI tool. The service defaults to port 9090. For custom voice cloning, you must provide a reference audio file URL and the exact text transcript of that audio when launching the service.

Use Cases

Personalized AI Avatars: Create a consistent voice identity for your custom AI agents.
Content Creation: Generate high-quality voiceovers for videos or presentations using a specific brand voice.
Accessibility Tools: Build applications that read text aloud in a natural, human-like voice.
Interactive Prototypes: Use the service to give a voice to prototypes that require high-fidelity audio output.

Example Prompts

"Generate a greeting for my welcome video saying: 'Welcome to the platform, we are excited to have you here.'"
"Convert this text to audio using my current reference voice: 'The system has successfully processed your request.'"
"Save an audio file of the following text: 'Please ensure all credentials are updated before proceeding.' and save it as status_update.wav"

Tips & Limitations

Performance: Because the reference audio is pre-computed, subsequent requests for speech are extremely fast. Avoid switching reference audio files frequently to maintain this performance benefit.
Audio Quality: Always use clear, high-quality reference audio (WAV format recommended) for the best cloning results. Background noise in your reference sample will be captured in the cloned output.
Network: The service requires network access to fetch the reference audio file if provided via a remote URL. Ensure your environment has proper access permissions.

chichi-speech

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)