chichi-speech
A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.
Why use this skill?
Deploy a powerful Qwen3-based text-to-speech service with voice cloning. Efficient, consistent, and easy to use for professional AI-generated audio output.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/hudeven/chichi-speechWhat This Skill Does
The chichi-speech skill is a specialized FastAPI-based REST service designed to deliver high-quality text-to-speech (TTS) synthesis powered by the Qwen3 model. Unlike generic TTS engines, this service is specifically optimized for voice cloning using a pre-computed reference audio prompt. By anchoring the voice generation to a specific audio sample and its associated text, the service ensures that generated speech is not only high-fidelity but also consistent in timbre, prosody, and emotional tone across multiple requests. It is packaged as an installable CLI, making it easy to deploy within your OpenClaw environment.
Installation
To get started, ensure you have Python 3.10 or higher installed on your system. From your terminal, run the following command to install the skill:
clawhub install openclaw/skills/skills/hudeven/chichi-speech
Once installed, you can initialize the server using the provided CLI tool. The service defaults to port 9090. For custom voice cloning, you must provide a reference audio file URL and the exact text transcript of that audio when launching the service.
Use Cases
- Personalized AI Avatars: Create a consistent voice identity for your custom AI agents.
- Content Creation: Generate high-quality voiceovers for videos or presentations using a specific brand voice.
- Accessibility Tools: Build applications that read text aloud in a natural, human-like voice.
- Interactive Prototypes: Use the service to give a voice to prototypes that require high-fidelity audio output.
Example Prompts
- "Generate a greeting for my welcome video saying: 'Welcome to the platform, we are excited to have you here.'"
- "Convert this text to audio using my current reference voice: 'The system has successfully processed your request.'"
- "Save an audio file of the following text: 'Please ensure all credentials are updated before proceeding.' and save it as status_update.wav"
Tips & Limitations
- Performance: Because the reference audio is pre-computed, subsequent requests for speech are extremely fast. Avoid switching reference audio files frequently to maintain this performance benefit.
- Audio Quality: Always use clear, high-quality reference audio (WAV format recommended) for the best cloning results. Background noise in your reference sample will be captured in the cloned output.
- Network: The service requires network access to fetch the reference audio file if provided via a remote URL. Ensure your environment has proper access permissions.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-hudeven-chichi-speech": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write