inworld-tts
Text-to-speech via Inworld.ai API. Use when generating voice audio from text, creating spoken responses, or converting text to MP3/audio files. Supports multiple voices, speaking rates, and streaming for long text.
Why use this skill?
Convert text to high-quality speech with the Inworld.ai TTS skill. Supports customizable voices, speaking rates, and streaming for long-form audio generation.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/gugic/inworld-ttsWhat This Skill Does
The inworld-tts skill provides a robust interface for interacting with Inworld.ai's advanced text-to-speech engine. Designed for OpenClaw agents, this skill enables the conversion of textual data into high-quality, natural-sounding audio. It supports advanced configuration, including voice selection, adjustable speaking rates, and temperature controls, allowing for precise control over the output character and cadence. Whether you are generating narrated responses for user interaction, creating audio content, or prototyping conversational AI, this skill serves as the bridge between raw text and expressive voice synthesis.
Installation
Installation is straightforward and requires standard Unix-like utilities. First, ensure you have your API key generated from the Inworld platform with the 'Voices: Read' permission. Once obtained, set this as an environment variable in your .bashrc or local .env file to ensure secure persistence. You can manually clone the skill into your local skills directory or use the automated command: clawhub install openclaw/skills/skills/gugic/inworld-tts. Post-installation, verify the script is executable by running chmod +x on the tts.sh file. For seamless integration, consider symlinking the script to your /usr/local/bin folder, allowing you to trigger voice synthesis directly from any shell environment.
Use Cases
This skill is ideal for developers building interactive applications that require dynamic vocal feedback. Use it to: 1) Transform chat logs or logs into audio for accessibility; 2) Generate vocal narrations for automated reports; 3) Prototype character-driven AI voice agents where individual voice profiles (like 'Dennis') are required for roleplay; 4) Create MP3 files for external media consumption without manual recording; 5) Process long-form text blocks using the streaming mode for efficient buffer management.
Example Prompts
- "Use the inworld-tts skill to convert the last five lines of the project status report into an MP3 file using the default voice at 1.1 speed."
- "Generate a narration file named welcome.mp3 for our app intro using the inworld-tts utility with a high-temperature setting for dramatic flair."
- "Convert this long documentation text into an audio file, enabling the streaming mode to ensure the process handles the size correctly."
Tips & Limitations
Always remember that the INWORLD_API_KEY is a sensitive credential; never hardcode it directly into scripts. When processing text exceeding 4000 characters, you must utilize the --stream flag to prevent memory overflows or request timeouts. Note that the skill relies on external dependencies curl, jq, and base64. Ensure your system has these installed to avoid runtime errors. If you experience empty output files, verify that your API key has the necessary 'Voices: Read' permissions enabled in the Inworld dashboard, as insufficient scopes are the most frequent cause of failed synthesis requests.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-gugic-inworld-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: external-api, file-write, file-read