ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

chichi-speech

A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.

Why use this skill?

Deploy a powerful Qwen3-based text-to-speech service with voice cloning. Efficient, consistent, and easy to use for professional AI-generated audio output.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/hudeven/chichi-speech
Or

What This Skill Does

The chichi-speech skill is a specialized FastAPI-based REST service designed to deliver high-quality text-to-speech (TTS) synthesis powered by the Qwen3 model. Unlike generic TTS engines, this service is specifically optimized for voice cloning using a pre-computed reference audio prompt. By anchoring the voice generation to a specific audio sample and its associated text, the service ensures that generated speech is not only high-fidelity but also consistent in timbre, prosody, and emotional tone across multiple requests. It is packaged as an installable CLI, making it easy to deploy within your OpenClaw environment.

Installation

To get started, ensure you have Python 3.10 or higher installed on your system. From your terminal, run the following command to install the skill:

clawhub install openclaw/skills/skills/hudeven/chichi-speech

Once installed, you can initialize the server using the provided CLI tool. The service defaults to port 9090. For custom voice cloning, you must provide a reference audio file URL and the exact text transcript of that audio when launching the service.

Use Cases

  • Personalized AI Avatars: Create a consistent voice identity for your custom AI agents.
  • Content Creation: Generate high-quality voiceovers for videos or presentations using a specific brand voice.
  • Accessibility Tools: Build applications that read text aloud in a natural, human-like voice.
  • Interactive Prototypes: Use the service to give a voice to prototypes that require high-fidelity audio output.

Example Prompts

  1. "Generate a greeting for my welcome video saying: 'Welcome to the platform, we are excited to have you here.'"
  2. "Convert this text to audio using my current reference voice: 'The system has successfully processed your request.'"
  3. "Save an audio file of the following text: 'Please ensure all credentials are updated before proceeding.' and save it as status_update.wav"

Tips & Limitations

  • Performance: Because the reference audio is pre-computed, subsequent requests for speech are extremely fast. Avoid switching reference audio files frequently to maintain this performance benefit.
  • Audio Quality: Always use clear, high-quality reference audio (WAV format recommended) for the best cloning results. Background noise in your reference sample will be captured in the cloned output.
  • Network: The service requires network access to fetch the reference audio file if provided via a remote URL. Ensure your environment has proper access permissions.

Metadata

Author@hudeven
Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-hudeven-chichi-speech": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice-cloning#qwen#audio-synthesis#rest-api
Safety Score: 4/5

Flags: network-access, file-write