What This Skill Does

The Zvukogram skill is a robust text-to-speech (TTS) integration for OpenClaw that leverages the powerful Zvukogram API. It allows you to convert any text input into natural-sounding audio, supporting advanced features like SSML markup for fine-grained control over pronunciation, speed, and pauses. Whether you need to generate professional voiceovers for podcasts, automate voice notifications for system events, or create complex multi-voice dialogues, this skill provides a seamless interface to handle audio production. It includes built-in support for stress marks to ensure words are pronounced correctly, as well as alias tagging for custom English-to-Russian word transcriptions.

Installation

To begin, ensure you have the OpenClaw framework installed. Run the following command: clawhub install openclaw/skills/skills/erview/zvukogram. Next, you must authenticate. You can create the configuration file at ~/.config/zvukogram/config.json containing your API token and account email. Alternatively, set these as environment variables: ZVUKOGRAM_TOKEN and ZVUKOGRAM_EMAIL. Verify your setup by running python3 scripts/balance.py to ensure your credentials are correctly recognized by the API.

Use Cases

Podcasting & Content Creation: Generate multi-character scripts by merging audio fragments using different voice profiles.
System Notifications: Integrate audio alerts into your monitoring workflows to get vocal status updates when tasks complete.
News & Articles: Transform long-form written content into audio formats for accessibility and consumption on the go.
Dynamic Language Learning: Use SSML stress marks to provide auditory examples of correct pronunciation for challenging technical terms.

Example Prompts

"Generate a 30-second audio clip using the voice of Alena that reads the following welcome message: <prosody rate='1.1'>Welcome to the OpenClaw dashboard.</prosody>"
"Use the Andrei voice to read this text and save it as notification.mp3, making sure to pronounce GPT as <sub alias='Джи Пи Ти'>GPT</sub>."
"Convert the article about neural networks to speech. Use a fast rate for the introduction and add a 500ms break between the title and the body."

Tips & Limitations

Note that the API restricts requests to 1000 characters per call for standard text endpoints, though the /longtext endpoint can handle up to 1 million characters. Keep in mind that SSML tags like <voice> are not supported through the API and are limited to the web interface; multi-voice projects should be handled by generating separate fragments and merging them using tools like ffmpeg. Always verify your voice selection against the official registry to ensure compatibility with your desired tone.

zvukogram

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)

Related Skills

Clawhub Skill Creator