zai-tts
Text-to-speech conversion using GLM-TTS service via the `uvx zai-tts` command for generating audio from text. Use when (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, podcast, driving, cooking). (3) Using pre-cloned voices for speech.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/al-one/zai-ttsWhat This Skill Does
The zai-tts skill provides a robust interface for OpenClaw agents to convert text into high-quality, natural-sounding audio files using the GLM-TTS service. It enables agents to move beyond text-only interaction, allowing for audible responses, content narration, and accessible communication. The skill supports advanced configuration, including adjustable speaking speeds, volume levels, and a selection of pre-cloned or system-default voices, making it a versatile tool for any audio-centric workflow.
Installation
To integrate zai-tts into your OpenClaw environment, run the following command in your terminal:
clawhub install openclaw/skills/skills/al-one/zai-tts
Before executing, ensure you have configured your authentication credentials. Obtain your ZAI_AUDIO_USERID and ZAI_AUDIO_TOKEN by logging into audio.z.ai, opening your browser developer tools (F12), and inspecting the localStorage['auth-storage'] value. Export these as environment variables in your system to authorize the service requests.
Use Cases
This skill is ideal for scenarios requiring auditory feedback. Use it when users explicitly request voice output, or when accessibility needs necessitate screen-reading capabilities. It is perfect for multitasking workflows—such as generating podcasts from long-form articles, creating voiceovers for presentations, or providing spoken instructions for hands-busy activities like cooking or driving. By converting text to speech, your agent can deliver information in a more engaging and accessible format.
Example Prompts
- "Convert this article into a podcast episode and save it as episode_01.wav, using the Chloe voice for a professional tone."
- "Read the summary of the meeting notes out loud to me, but increase the speaking speed to 1.5 so I can listen quickly."
- "Create an audio guide for this text file, but use the Ethan voice and set the volume to 2 for better clarity in noisy environments."
Tips & Limitations
Always ensure your authentication tokens are valid; if the tool fails, refresh the localStorage data from the Zai portal. Use the uvx zai-tts -l command to list available voices periodically, as new custom-cloned voices will appear there once processed on the web platform. Note that high-volume processing may consume significant system resources or API usage limits, so batch your text requests where possible to maintain efficiency. The skill relies on external network connectivity to the GLM-TTS service, so ensure your firewall allows outgoing requests to the audio.z.ai endpoints.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-al-one-zai-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, external-api
Related Skills
maishou
商品价格全网对比技能,获取商品在淘宝(Taobao)、天猫(TMall)、京东(JD.com)、拼多多(PinDuoDuo)、抖音(Douyin)、快手(KaiShou)的最优价格、优惠券,当用户想购物或者获取优惠信息时使用。Get the best price, coupons for goods on Chinese e-commerce platforms, compare product prices, and use when users want to shop or get discount information.
edgeone
Deploy HTML content to EdgeOne Pages, return the public URL.
mcp-lark
Based on FeiShu(飞书) / Lark's OpenAPI MCP server, manage user information, chats, emails, cloud documents, multidimensional tables, tasks, calendars, etc.
edge-tts-uvx
Text-to-speech conversion using `uvx edge-tts` for generating audio from text. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
mcp-hass
The skill for control Home Assistant smart home devices and query states using MCP protocol.