What This Skill Does

The elevenlabs-speech skill provides a comprehensive voice-processing engine for the OpenClaw AI agent, bridging the gap between text-based AI processing and high-fidelity human speech. By leveraging the industry-leading ElevenLabs API, this skill allows your agent to handle both Text-to-Speech (TTS) for natural voice generation and Speech-to-Text (STT) via Scribe for accurate audio transcription. Whether you need to voice your agent's responses for Telegram or transcribe incoming voice notes from a team, this skill offers robust controls over voice identity, emotional stability, and language support.

Installation

To integrate this capability into your environment, run the following command in your terminal:

clawhub install openclaw/skills/skills/jeffpignataro/miranda-elevenlabs-speech

Ensure you have your ElevenLabs API key ready. Export it as an environment variable ELEVENLABS_API_KEY or include it in your .env file to ensure the client can authenticate with the service automatically.

Use Cases

Automated Communication: Convert agent text responses into natural-sounding voice files for seamless integration with messaging platforms like Telegram.
Voice Note Transcription: Automatically parse voice messages sent by users or team members, turning audio input into actionable text for the AI agent.
Multilingual Support: Utilize the eleven_multilingual_v2 model to bridge communication barriers by synthesizing speech in various languages.
Accessibility: Provide auditory feedback for users who prefer listening to content over reading, improving overall agent accessibility.

Example Prompts

"Convert my last message to an audio file using the Rachel voice and send it to me as a voice note on Telegram."
"Transcribe this audio file located at /downloads/voice_note.ogg and summarize the key action items for me."
"Say 'System update complete' using an authoritative voice model like Arnold."

Tips & Limitations

To get the best results, experiment with the stability and similarity_boost settings. Lower stability values are better for emotional, expressive speech, while higher values ensure a consistent, professional tone. Note that the quality of transcription via Scribe is dependent on audio clarity; background noise can impact the accuracy of STT results. Always monitor your ElevenLabs usage, as high-quality speech generation consumes characters/credits according to your specific subscription plan.

elevenlabs-speech

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)

Related Skills

sag