ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

volcengine-ai-audio-tts

Text-to-speech generation on Volcengine (ByteDance) speech services. Use when users need narration, multi-language speech output, voice selection, or TTS troubleshooting. Supports online one-shot HTTP API (openspeech.bytedance.com).

Why use this skill?

Integrate high-quality Volcengine Text-to-Speech into OpenClaw. Generate natural narration, multi-language speech, and custom audio from text.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/day253/day253-volcengine-ai-audio-tts
Or

What This Skill Does

The volcengine-ai-audio-tts skill provides a robust interface to Volcengine (ByteDance) high-fidelity text-to-speech services. It enables the OpenClaw AI agent to convert digital text into natural-sounding speech using advanced neural TTS technology. Designed for versatility, this skill supports multiple languages, diverse voice profiles, and configurable audio parameters such as pitch, speed, and volume. By leveraging the openspeech.bytedance.com API, it offers a seamless way to integrate professional-grade narration, accessibility features, or personalized voice feedback into your automated workflows. The skill is built to handle one-shot HTTP requests, ensuring fast, reliable synthesis for standard-length text inputs.

Installation

To begin, ensure you have Python 3.8+ installed on your system. You can install the required dependencies with: pip install requests. Install the skill package via the ClawKit manager using the command: clawhub install openclaw/skills/skills/day253/day253-volcengine-ai-audio-tts. After installation, configure your credentials by setting the VOLCENGINE_TTS_APP_ID, VOLCENGINE_TTS_TOKEN, and VOLCENGINE_TTS_CLUSTER environment variables. These are obtained directly from your Volcengine 豆包语音控制台 dashboard.

Use Cases

This skill is ideal for content creators looking to generate voiceovers for videos, developers building accessibility-focused applications, and professionals requiring automated narration for reports. Whether you are generating multilingual support responses, creating dynamic audio content for interactive media, or testing speech-enabled prototypes, this tool provides the precision and professional voice quality required for high-end production environments.

Example Prompts

  1. "Convert this article into a professional narration using the BV700_streaming voice and save it as an mp3 file in the output folder."
  2. "Please generate a Spanish audio clip for this script. Set the speed to 1.1x and make sure the output format is WAV."
  3. "Synthesize the following text: 'Welcome to our platform' using the default voice, but adjust the pitch slightly lower at 0.9."

Tips & Limitations

For optimal performance, keep individual request texts under 1024 bytes. For longer documents, it is recommended to segment the text and process it in chunks to avoid timeout or API rejection. Always verify your voice_type against the official Volcengine documentation to ensure compatibility with your selected cluster. If you encounter a 429 status code, implement a small wait interval between requests to respect API rate limits. Ensure that your output directory exists or is correctly mapped to avoid permission errors during file writes.

Metadata

Author@day253
Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-day253-day253-volcengine-ai-audio-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#speech#bytedance#narration#audio
Safety Score: 4/5

Flags: network-access, file-write, external-api