elevenlabs-transcribe
Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
Why use this skill?
Easily transcribe audio files, live streams, and microphone input with the ElevenLabs OpenClaw skill. Supports 90+ languages and diarization.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/paulasjes/elevenlabs-transcribeWhat This Skill Does
The elevenlabs-transcribe skill is a high-performance speech-to-text integration designed for the OpenClaw agent ecosystem. Leveraging ElevenLabs' state-of-the-art transcription engine, it enables agents to convert spoken language into high-accuracy text. The tool is remarkably versatile, handling batch processing of local audio files, real-time streaming from live URLs, and direct microphone input. It supports over 90 languages and includes advanced features like speaker diarization (differentiating between speakers), event tagging (e.g., laughter or music), and granular JSON outputs containing word-level timestamps, making it an ideal choice for both transcription tasks and complex agent workflows requiring auditory input analysis.
Installation
To integrate this skill into your environment, ensure you have ffmpeg installed globally (e.g., via brew install ffmpeg). Set your ElevenLabs API key as an environment variable named ELEVENLABS_API_KEY. Once configured, execute the following command in your terminal:
clawhub install openclaw/skills/skills/paulasjes/elevenlabs-transcribe
The skill will automatically handle Python dependency installation on its first execution, ensuring a seamless setup process.
Use Cases
- Meeting Intelligence: Automatically transcribe recorded video conferencing calls with speaker diarization to generate meeting summaries.
- Content Creation: Quickly convert raw interview audio or podcast recordings into production-ready transcripts.
- Live Monitoring: Stream and monitor live audio feeds (radio or web streams) for specific keywords or data points.
- Agent Voice Interaction: Enable your OpenClaw agent to "listen" to your microphone, allowing you to give voice commands instead of typing prompts.
- Archive Indexing: Process large libraries of local media files into searchable text formats.
Example Prompts
- "Transcribe the meeting recording 'client_call.mp3' and make sure to identify who said what using speaker diarization."
- "Listen to my microphone for the next minute and summarize everything I say regarding the project roadmap."
- "Transcribe the live radio stream at [URL] and provide the output in JSON format with timestamps."
Tips & Limitations
- Efficiency: Always use the
--quietflag when running this skill as part of an automated agent loop to keep logs clean and reduce unnecessary noise. - Format Flexibility: The tool supports a wide array of formats, including most major audio and video containers (MP4, WAV, MOV, etc.), making it robust for mixed-media projects.
- Hard Limits: Ensure your files remain under the 3GB limit and 10-hour duration cap per task. For extremely long files, consider splitting them prior to processing.
- Accuracy: For non-English audio, always explicitly provide the language code using the
--langflag to maximize transcription precision.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-paulasjes-elevenlabs-transcribe": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, external-api