ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

elevenlabs

Text-to-speech, sound effects, music generation, voice management, and quota checks via the ElevenLabs API. Use when generating audio with ElevenLabs or managing voices.

Why use this skill?

Integrate ElevenLabs into OpenClaw for professional text-to-speech, sound effects, and music. Support for advanced v3 emotional audio tags and multiple audio formats.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/odrobnik/elevenlabs
Or

What This Skill Does

The ElevenLabs skill provides a powerful interface for integrating professional-grade AI audio synthesis directly into your OpenClaw agent workflows. It enables high-quality text-to-speech, custom sound effect generation, and music composition. By leveraging the ElevenLabs API, this skill allows users to generate diverse audio assets using a variety of cutting-edge models like Eleven v3 and Turbo v2.5. Whether you are narrating a story with emotional nuance using square-bracket audio tags or generating procedural sound effects for an application, this skill acts as your central control hub for all things audio.

Installation

To install this skill, use the ClawHub command-line interface provided within your OpenClaw environment:

clawhub install openclaw/skills/skills/odrobnik/elevenlabs

Ensure you have your ElevenLabs API credentials configured in your environment variables. Refer to the SETUP.md file within the skill directory for specific instructions on managing your API keys and prerequisites.

Use Cases

  • Character Narration: Use Eleven v3 with emotional tags like [whispers] or [laughs] to create immersive audiobooks or RPG dialogue.
  • Real-time Interaction: Utilize Turbo v2.5 for low-latency conversational agents that require near-instant response times.
  • Content Creation: Generate custom sound effects or background music loops for video projects, games, or social media content.
  • Accessibility: Transform written documentation or chat responses into high-quality spoken audio for users who prefer listening over reading.

Example Prompts

  1. "Generate a deep, gravelly voice reading this text about space exploration, and please use the Eleven v3 model to ensure the narrator sounds genuinely excited at the end: [excited] It was the greatest discovery of our lifetime!"
  2. "Create a 10-second lo-fi hip hop beat loop that I can use as background music for my productivity stream, and save it as an opus file for high quality."
  3. "Produce a sound effect of a heavy metal door slamming shut, and output the file as a standard 44.1kHz MP3."

Tips & Limitations

  • Audio Tags: Only the eleven_v3 model supports the square-bracket tag system. Using tags with other models will result in them being read aloud as plain text.
  • Format Selection: For web-based playback or AirPlay compatibility, prefer the opus_48000_192 format. For general compatibility across older devices, stick to mp3_44100_128.
  • Performance: While Flash v2.5 is the fastest and most cost-effective model, prioritize Eleven v3 for creative storytelling tasks to take advantage of the sophisticated emotional, tonal, and cadence control it offers.

Metadata

Author@odrobnik
Stars1287
Views5
Updated2026-02-22
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-odrobnik-elevenlabs": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#speech#tts#music#elevenlabs
Safety Score: 4/5

Flags: external-api, file-write