Official Verified media Safety 4/5

zhipu-tts

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.

Why use this skill?

Easily convert Chinese text into natural-sounding audio with the Zhipu AI TTS skill for OpenClaw. Features multiple voices, speed control, and easy installation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-tts

Download Source Code (.zip)

What This Skill Does

The zhipu-tts skill leverages Zhipu AI's powerful GLM-TTS (Text-to-Speech) engine to convert written Chinese characters into high-quality, natural-sounding audio. Designed for seamless integration into the OpenClaw ecosystem, this skill provides a versatile toolkit for developers and creators to generate voiceovers, announcements, and narrations. It supports a diverse range of voice personas, precise speed control, and multiple output formats including WAV and PCM, making it suitable for both professional and creative applications.

Installation

To begin using this skill, ensure you have the OpenClaw agent environment configured. Install the skill directly from the repository using the following command:

clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-tts

After installation, you must obtain an API key from the Zhipu AI Console and set it as an environment variable in your terminal session: export ZHIPU_API_KEY="your_key_here"

Use Cases

This skill is ideal for a wide variety of scenarios, including:

Customer Service: Automating responses for telephone systems or interactive kiosks with professional, friendly tones.
Content Creation: Generating voice-overs for video content, social media snippets, or character dialogue in creative projects.
Accessibility: Converting textual notifications or documents into audible formats for users who prefer listening.
Education: Providing audio cues for language learning tools or interactive reading materials.

Example Prompts

"Generate a warm, professional greeting for our new office phone system using the 'tongtong' voice at a normal speed, save it as 'welcome.wav'."
"Create an energetic announcement for our upcoming flash sale using the 'xiaochen' persona at 1.3 speed."
"Convert this article segment into a calm, deeper male narration for my podcast background using the 'chuichui' voice at 0.9 speed."

Tips & Limitations

To ensure optimal performance, keep the following guidelines in mind:

Character Limits: The API supports up to 1024 characters per request. For longer documents, utilize a loop to split text into manageable paragraphs and merge the resulting audio files.
Speed Selection: While the 0.5 to 2.0 range is supported, sticking between 0.9 and 1.2 is recommended for the most human-like delivery. Higher speeds are useful for dense information, while lower speeds are best for emphasis.
File Formats: Use WAV for general compatibility and quality. Use PCM only if you are building a system that requires raw data for real-time streaming or further post-processing.
Voice Selection: Test different personas ('jam', 'kazi', etc.) to match the specific character of your content; entertainment-focused projects benefit significantly from these unique, non-traditional voice profiles.

Read Full Documentation on GitHub

Metadata

Author@franklu0819-lang

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-franklu0819-lang-zhipu-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#speech#zhipu#chinese

Safety Score: 4/5

Flags: external-api, file-write

Related Skills

zhipu-asr

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio transcription with context prompts, custom hotwords, and multiple audio formats.

franklu0819-lang 2387

feishu-file

飞书文件发送技能。支持发送各类文件到飞书聊天，包括文档、图片、压缩包等，自动识别文件类型并处理上传。

franklu0819-lang 2387

clawhub-manager

ClawHub 技能管理工具。封装技能的发布、删除、查询和搜索功能，方便管理 ClawHub 上的技能。

franklu0819-lang 2387

feishu-voice

飞书语音消息发送技能。将文本转换为语音并发送到飞书，支持 TTS 生成、格式转换、时长读取、文件上传和消息发送。

franklu0819-lang 2387

md2pdf

Markdown 转 PDF 技能。将 Markdown 文件转换为精美的 PDF 文档，完美支持中文、代码高亮、自定义样式。

franklu0819-lang 2387