Official Verified media Safety 4/5

openai-tts

Text-to-speech conversion using OpenAI's TTS API for generating high-quality, natural-sounding audio. Supports 6 voices (alloy, echo, fable, onyx, nova, shimmer), speed control (0.25x-4.0x), HD quality model, multiple output formats (mp3, opus, aac, flac), and automatic text chunking for long content (4096 char limit per request). Use when: (1) User requests audio/voice output with triggers like "read this to me", "convert to audio", "generate speech", "text to speech", "tts", "narrate", "speak", or when keywords "openai tts", "voice", "podcast" appear. (2) Content needs to be spoken rather than read (multitasking, accessibility). (3) User wants specific voice preferences like "alloy", "echo", "fable", "onyx", "nova", "shimmer" or speed adjustments.

Why use this skill?

Integrate natural-sounding speech into your OpenClaw agent. Supports 6 voices, HD audio models, and long-text narration with automatic chunking.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/merend/openai-tts-python

Download Source Code (.zip)

What This Skill Does

The OpenAI TTS (Text-to-Speech) skill integrates the powerful OpenAI audio generation engine directly into your OpenClaw agent. It converts plain text into high-quality, natural-sounding speech across six distinct voices. Whether you are generating simple voice notifications, creating narration for long-form content, or building complex multimedia workflows, this skill manages the heavy lifting, including intelligent text-chunking to handle content exceeding the 4096-character limit per API request.

Installation

To integrate this skill into your environment, run the following command in your terminal:

clawhub install openclaw/skills/skills/merend/openai-tts-python

Ensure that you have your OPENAI_API_KEY exported in your environment variables. You will also need Python 3.8+ installed and the openai library. The pydub library is highly recommended for users handling long-form text that requires post-generation audio concatenation.

Use Cases

This skill is designed for scenarios where auditory feedback or content consumption is preferred over traditional reading. Use it for:

Accessibility: Providing audio versions of text content for visually impaired users.
Multitasking: Converting articles, reports, or documentation into a podcast-like format for listening while commuting or performing other tasks.
Content Creation: Automating the voiceover process for video projects or presentations.
Notification Systems: Adding personalized voice alerts to your custom agent automations.

Example Prompts

"Convert this article into a podcast format using the 'onyx' voice so I can listen to it in the car."
"Read the following notes to me with a storytelling tone: [Insert Text Here]."
"Generate a speech file from this report; please use a slow speed and the 'nova' voice."

Tips & Limitations

The OpenAI TTS API enforces a 4096-character limit per request. While this skill provides a logic wrapper for auto-chunking, extremely large texts will take proportionally longer to process due to individual API request overhead. For best results, use the 'tts-1-hd' model when clarity is paramount, or 'tts-1' for faster, lower-latency response times. Always check your API usage limits, as high-volume generation can incur costs directly via your OpenAI account billing.

Read Full Documentation on GitHub

Metadata

Author@merend

Stars1401

Updated2026-02-24

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-merend-openai-tts-python": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#openai#audio#voice#accessibility

Safety Score: 4/5

Flags: file-write, external-api