What This Skill Does

The elevenlabs-speech skill provides a comprehensive voice-processing solution for OpenClaw, integrating the powerful ElevenLabs API for both Text-to-Speech (TTS) and Speech-to-Text (STT) tasks. This skill allows your AI agent to produce human-like, high-fidelity audio from text, supporting various emotional ranges and voice profiles, while also enabling the agent to understand and transcribe voice messages sent by users. By leveraging advanced models like 'eleven_turbo_v2_5', it ensures fast, low-latency performance suitable for real-time interactions, multilingual support, and complex audio processing tasks such as speaker diarization.

Installation

To integrate this skill into your environment, use the ClawHub CLI tool. Ensure you have your ElevenLabs API key ready, which can be acquired from the ElevenLabs platform. Run the following command in your terminal:

clawhub install openclaw/skills/skills/amreahmed/elevenlabs-voice

Once installed, define your credentials by setting the environment variable in your terminal: export ELEVENLABS_API_KEY="your_api_key_here", or add it to a .env file located in your project's root directory to ensure the agent can authenticate with the API.

Use Cases

Voice Assistants: Create conversational interfaces that respond to text queries with natural, expressive vocal output.
Content Accessibility: Automatically convert long-form documents or articles into high-quality audio files for listeners.
Transcription Services: Process incoming voice memos or audio files to generate searchable, formatted text transcripts, including support for identifying multiple speakers.
Multilingual Support: Deliver spoken content in multiple languages with authentic accents using the specialized multilingual model.

Example Prompts

"Convert the following text into an audio file using the voice of Rachel: 'Welcome to the system, how can I help you today?' and save it as welcome.mp3."
"Transcribe this voice message found at audio_input.ogg and identify how many people are speaking in the recording."
"Summarize the audio file located in my downloads folder after transcribing it to text."

Tips & Limitations

To get the best results, experiment with the stability and similarity_boost settings. Lowering stability often results in more emotive, expressive speech, whereas increasing it provides consistent, flatter delivery. Always choose the appropriate model for your use case; use eleven_multilingual_v2 for non-English content to maintain pronunciation accuracy. Note that this skill requires active internet connectivity to communicate with the ElevenLabs cloud servers and requires an active billing subscription for high-volume tasks beyond the provided free tier.

elevenlabs-speech

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)