ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

zvukogram

Text-to-Speech via Zvukogram API with SSML support. Use when you need to generate speech from text, create podcasts, voice notifications, or work with audio. Supports speed control, stress marks, English word transcription, and audio fragment merging.

Why use this skill?

Easily convert text to speech with the Zvukogram skill for OpenClaw. Features SSML support, voice control, and audio merging for professional voiceovers and notifications.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/erview/zvukogram
Or

What This Skill Does

The Zvukogram skill is a robust text-to-speech (TTS) integration for OpenClaw that leverages the powerful Zvukogram API. It allows you to convert any text input into natural-sounding audio, supporting advanced features like SSML markup for fine-grained control over pronunciation, speed, and pauses. Whether you need to generate professional voiceovers for podcasts, automate voice notifications for system events, or create complex multi-voice dialogues, this skill provides a seamless interface to handle audio production. It includes built-in support for stress marks to ensure words are pronounced correctly, as well as alias tagging for custom English-to-Russian word transcriptions.

Installation

To begin, ensure you have the OpenClaw framework installed. Run the following command: clawhub install openclaw/skills/skills/erview/zvukogram. Next, you must authenticate. You can create the configuration file at ~/.config/zvukogram/config.json containing your API token and account email. Alternatively, set these as environment variables: ZVUKOGRAM_TOKEN and ZVUKOGRAM_EMAIL. Verify your setup by running python3 scripts/balance.py to ensure your credentials are correctly recognized by the API.

Use Cases

  • Podcasting & Content Creation: Generate multi-character scripts by merging audio fragments using different voice profiles.
  • System Notifications: Integrate audio alerts into your monitoring workflows to get vocal status updates when tasks complete.
  • News & Articles: Transform long-form written content into audio formats for accessibility and consumption on the go.
  • Dynamic Language Learning: Use SSML stress marks to provide auditory examples of correct pronunciation for challenging technical terms.

Example Prompts

  1. "Generate a 30-second audio clip using the voice of Alena that reads the following welcome message: <prosody rate='1.1'>Welcome to the OpenClaw dashboard.</prosody>"
  2. "Use the Andrei voice to read this text and save it as notification.mp3, making sure to pronounce GPT as <sub alias='Джи Пи Ти'>GPT</sub>."
  3. "Convert the article about neural networks to speech. Use a fast rate for the introduction and add a 500ms break between the title and the body."

Tips & Limitations

Note that the API restricts requests to 1000 characters per call for standard text endpoints, though the /longtext endpoint can handle up to 1 million characters. Keep in mind that SSML tags like <voice> are not supported through the API and are limited to the web interface; multi-voice projects should be handled by generating separate fragments and merging them using tools like ffmpeg. Always verify your voice selection against the official registry to ensure compatibility with your desired tone.

Metadata

Author@erview
Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-erview-zvukogram": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#audio#voice#speech#ssml
Safety Score: 4/5

Flags: file-write, file-read, external-api