Official Verified media Safety 4/5

zai-tts

Text-to-speech conversion using GLM-TTS service via the `uvx zai-tts` command for generating audio from text. Use when (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, podcast, driving, cooking). (3) Using pre-cloned voices for speech.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/al-one/zai-tts

Download Source Code (.zip)

What This Skill Does

The zai-tts skill provides a robust interface for OpenClaw agents to convert text into high-quality, natural-sounding audio files using the GLM-TTS service. It enables agents to move beyond text-only interaction, allowing for audible responses, content narration, and accessible communication. The skill supports advanced configuration, including adjustable speaking speeds, volume levels, and a selection of pre-cloned or system-default voices, making it a versatile tool for any audio-centric workflow.

Installation

To integrate zai-tts into your OpenClaw environment, run the following command in your terminal: clawhub install openclaw/skills/skills/al-one/zai-tts

Before executing, ensure you have configured your authentication credentials. Obtain your ZAI_AUDIO_USERID and ZAI_AUDIO_TOKEN by logging into audio.z.ai, opening your browser developer tools (F12), and inspecting the localStorage['auth-storage'] value. Export these as environment variables in your system to authorize the service requests.

Use Cases

This skill is ideal for scenarios requiring auditory feedback. Use it when users explicitly request voice output, or when accessibility needs necessitate screen-reading capabilities. It is perfect for multitasking workflows—such as generating podcasts from long-form articles, creating voiceovers for presentations, or providing spoken instructions for hands-busy activities like cooking or driving. By converting text to speech, your agent can deliver information in a more engaging and accessible format.

Example Prompts

"Convert this article into a podcast episode and save it as episode_01.wav, using the Chloe voice for a professional tone."
"Read the summary of the meeting notes out loud to me, but increase the speaking speed to 1.5 so I can listen quickly."
"Create an audio guide for this text file, but use the Ethan voice and set the volume to 2 for better clarity in noisy environments."

Tips & Limitations

Always ensure your authentication tokens are valid; if the tool fails, refresh the localStorage data from the Zai portal. Use the uvx zai-tts -l command to list available voices periodically, as new custom-cloned voices will appear there once processed on the web platform. Note that high-volume processing may consume significant system resources or API usage limits, so batch your text requests where possible to maintain efficiency. The skill relies on external network connectivity to the GLM-TTS service, so ensure your firewall allows outgoing requests to the audio.z.ai endpoints.

Read Full Documentation on GitHub

Metadata

Author@al-one

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-al-one-zai-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#speech#accessibility#narration

Safety Score: 4/5

Flags: file-write, file-read, external-api

Related Skills

maishou

商品价格全网对比技能，获取商品在淘宝(Taobao)、天猫(TMall)、京东(JD.com)、拼多多(PinDuoDuo)、抖音(Douyin)、快手(KaiShou)的最优价格、优惠券，当用户想购物或者获取优惠信息时使用。Get the best price, coupons for goods on Chinese e-commerce platforms, compare product prices, and use when users want to shop or get discount information.

al-one 4473

edgeone

Deploy HTML content to EdgeOne Pages, return the public URL.

al-one 4473

mcp-lark

Based on FeiShu(飞书) / Lark's OpenAPI MCP server, manage user information, chats, emails, cloud documents, multidimensional tables, tasks, calendars, etc.

al-one 4473

edge-tts-uvx

Text-to-speech conversion using `uvx edge-tts` for generating audio from text. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

al-one 4473

mcp-hass

The skill for control Home Assistant smart home devices and query states using MCP protocol.

al-one 4473