ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

zhipu-tts

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.

Why use this skill?

Easily convert Chinese text into natural-sounding audio with the Zhipu AI TTS skill for OpenClaw. Features multiple voices, speed control, and easy installation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-tts
Or

What This Skill Does

The zhipu-tts skill leverages Zhipu AI's powerful GLM-TTS (Text-to-Speech) engine to convert written Chinese characters into high-quality, natural-sounding audio. Designed for seamless integration into the OpenClaw ecosystem, this skill provides a versatile toolkit for developers and creators to generate voiceovers, announcements, and narrations. It supports a diverse range of voice personas, precise speed control, and multiple output formats including WAV and PCM, making it suitable for both professional and creative applications.

Installation

To begin using this skill, ensure you have the OpenClaw agent environment configured. Install the skill directly from the repository using the following command:

clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-tts

After installation, you must obtain an API key from the Zhipu AI Console and set it as an environment variable in your terminal session: export ZHIPU_API_KEY="your_key_here"

Use Cases

This skill is ideal for a wide variety of scenarios, including:

  1. Customer Service: Automating responses for telephone systems or interactive kiosks with professional, friendly tones.
  2. Content Creation: Generating voice-overs for video content, social media snippets, or character dialogue in creative projects.
  3. Accessibility: Converting textual notifications or documents into audible formats for users who prefer listening.
  4. Education: Providing audio cues for language learning tools or interactive reading materials.

Example Prompts

  1. "Generate a warm, professional greeting for our new office phone system using the 'tongtong' voice at a normal speed, save it as 'welcome.wav'."
  2. "Create an energetic announcement for our upcoming flash sale using the 'xiaochen' persona at 1.3 speed."
  3. "Convert this article segment into a calm, deeper male narration for my podcast background using the 'chuichui' voice at 0.9 speed."

Tips & Limitations

To ensure optimal performance, keep the following guidelines in mind:

  • Character Limits: The API supports up to 1024 characters per request. For longer documents, utilize a loop to split text into manageable paragraphs and merge the resulting audio files.
  • Speed Selection: While the 0.5 to 2.0 range is supported, sticking between 0.9 and 1.2 is recommended for the most human-like delivery. Higher speeds are useful for dense information, while lower speeds are best for emphasis.
  • File Formats: Use WAV for general compatibility and quality. Use PCM only if you are building a system that requires raw data for real-time streaming or further post-processing.
  • Voice Selection: Test different personas ('jam', 'kazi', etc.) to match the specific character of your content; entertainment-focused projects benefit significantly from these unique, non-traditional voice profiles.

Metadata

Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-franklu0819-lang-zhipu-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#speech#zhipu#chinese
Safety Score: 4/5

Flags: external-api, file-write