elevenlabs-speech
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/amreahmed/elevenlabs-voiceWhat This Skill Does
The elevenlabs-speech skill provides a comprehensive voice-processing solution for OpenClaw, integrating the powerful ElevenLabs API for both Text-to-Speech (TTS) and Speech-to-Text (STT) tasks. This skill allows your AI agent to produce human-like, high-fidelity audio from text, supporting various emotional ranges and voice profiles, while also enabling the agent to understand and transcribe voice messages sent by users. By leveraging advanced models like 'eleven_turbo_v2_5', it ensures fast, low-latency performance suitable for real-time interactions, multilingual support, and complex audio processing tasks such as speaker diarization.
Installation
To integrate this skill into your environment, use the ClawHub CLI tool. Ensure you have your ElevenLabs API key ready, which can be acquired from the ElevenLabs platform. Run the following command in your terminal:
clawhub install openclaw/skills/skills/amreahmed/elevenlabs-voice
Once installed, define your credentials by setting the environment variable in your terminal: export ELEVENLABS_API_KEY="your_api_key_here", or add it to a .env file located in your project's root directory to ensure the agent can authenticate with the API.
Use Cases
- Voice Assistants: Create conversational interfaces that respond to text queries with natural, expressive vocal output.
- Content Accessibility: Automatically convert long-form documents or articles into high-quality audio files for listeners.
- Transcription Services: Process incoming voice memos or audio files to generate searchable, formatted text transcripts, including support for identifying multiple speakers.
- Multilingual Support: Deliver spoken content in multiple languages with authentic accents using the specialized multilingual model.
Example Prompts
- "Convert the following text into an audio file using the voice of Rachel: 'Welcome to the system, how can I help you today?' and save it as welcome.mp3."
- "Transcribe this voice message found at audio_input.ogg and identify how many people are speaking in the recording."
- "Summarize the audio file located in my downloads folder after transcribing it to text."
Tips & Limitations
To get the best results, experiment with the stability and similarity_boost settings. Lowering stability often results in more emotive, expressive speech, whereas increasing it provides consistent, flatter delivery. Always choose the appropriate model for your use case; use eleven_multilingual_v2 for non-English content to maintain pronunciation accuracy. Note that this skill requires active internet connectivity to communicate with the ElevenLabs cloud servers and requires an active billing subscription for high-volume tasks beyond the provided free tier.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-amreahmed-elevenlabs-voice": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api