Official Verified media Safety 4/5

cosyvoice3

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. Supports Chinese, English, Japanese, Korean, and 18+ Chinese dialects. Provides zero-shot voice cloning, cross-lingual synthesis, and fine-grained control. Use when: (1) User requests local TTS with high-quality Chinese/English voices. (2) Need voice cloning from reference audio. (3) Offline/inference TTS is required. (4) User wants natural-sounding speech with emotion/dialect control.

Why use this skill?

High-quality local TTS for Apple Silicon. Supports 9 languages, 18+ Chinese dialects, and zero-shot voice cloning with fine-grained emotional control.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/lhuaizhong/cosyvoice3-macos

Download Source Code (.zip)

What This Skill Does

CosyVoice3 is a state-of-the-art text-to-speech (TTS) engine by Alibaba, optimized specifically for Apple Silicon (M1/M2/M3) hardware via OpenClaw. This skill provides high-fidelity, natural-sounding speech synthesis supporting 9 major languages (including English, Chinese, Japanese, and Korean) and over 18 distinct Chinese dialects. Unlike cloud-based TTS solutions, this runs entirely locally on your machine, ensuring data privacy and offline accessibility. It features powerful zero-shot voice cloning capabilities, allowing you to synthesize speech that mimics a specific person's timbre using only 3-10 seconds of reference audio. Additionally, it supports cross-lingual synthesis, meaning you can generate English speech using a Chinese voice profile, and provides fine-grained control over prosody, speed, and emotional inflection via text-based tags.

Installation

To install this skill, execute the following command in your OpenClaw environment: clawhub install openclaw/skills/skills/lhuaizhong/cosyvoice3-macos After installation, navigate to /Users/lhz/.openclaw/workspace/skills/cosyvoice3/scripts and run bash install.sh. This process will automatically set up a dedicated Conda environment, configure the necessary PyTorch dependencies for your Apple Silicon hardware, and download the Fun-CosyVoice3-0.5B model weights.

Use Cases

Professional Voice Overs: Generate high-quality narration for videos or presentations without expensive studio equipment.
Content Localization: Easily translate and synthesize scripts into multiple languages while maintaining a consistent voice identity.
Accessibility & Assistive Tech: Create natural, human-like voice feedback for applications or reading assistants.
Creative AI Projects: Clone voices for character narration in games, animations, or personalized audiobooks.

Example Prompts

"Use CosyVoice3 to narrate this article in a calm, professional tone using the default female voice."
"Clone my voice from 'reference.wav' and read the following text: 'Hello, this is a test of my synthetic twin.'"
"Synthesize this Chinese script into English using the voice from my saved assets, and set the speed to 1.2x."

Tips & Limitations

Reference Audio: When performing zero-shot cloning, ensure the audio is clear, free of background noise, and 3-10 seconds long for best results.
Tagging: Always include the <|endofprompt|> token in your reference text segments to help the model distinguish between prompt content and generated output.
Performance: While Apple Silicon is efficient, generating very long audio clips may take time. Break large blocks of text into smaller paragraphs for faster synthesis.
Storage: Ensure you have at least 5GB of free space before beginning the installation.

Read Full Documentation on GitHub

Metadata

Author@lhuaizhong

Stars1656

Updated2026-02-28

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-lhuaizhong-cosyvoice3-macos": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#voice-cloning#apple-silicon#audio-synthesis#local-ai

Safety Score: 4/5

Flags: file-read, file-write, code-execution