What This Skill Does

The elevenlabs-transcribe skill is a high-performance speech-to-text integration designed for the OpenClaw agent ecosystem. Leveraging ElevenLabs' state-of-the-art transcription engine, it enables agents to convert spoken language into high-accuracy text. The tool is remarkably versatile, handling batch processing of local audio files, real-time streaming from live URLs, and direct microphone input. It supports over 90 languages and includes advanced features like speaker diarization (differentiating between speakers), event tagging (e.g., laughter or music), and granular JSON outputs containing word-level timestamps, making it an ideal choice for both transcription tasks and complex agent workflows requiring auditory input analysis.

Installation

To integrate this skill into your environment, ensure you have ffmpeg installed globally (e.g., via brew install ffmpeg). Set your ElevenLabs API key as an environment variable named ELEVENLABS_API_KEY. Once configured, execute the following command in your terminal: clawhub install openclaw/skills/skills/paulasjes/elevenlabs-transcribe The skill will automatically handle Python dependency installation on its first execution, ensuring a seamless setup process.

Use Cases

Meeting Intelligence: Automatically transcribe recorded video conferencing calls with speaker diarization to generate meeting summaries.
Content Creation: Quickly convert raw interview audio or podcast recordings into production-ready transcripts.
Live Monitoring: Stream and monitor live audio feeds (radio or web streams) for specific keywords or data points.
Agent Voice Interaction: Enable your OpenClaw agent to "listen" to your microphone, allowing you to give voice commands instead of typing prompts.
Archive Indexing: Process large libraries of local media files into searchable text formats.

Example Prompts

"Transcribe the meeting recording 'client_call.mp3' and make sure to identify who said what using speaker diarization."
"Listen to my microphone for the next minute and summarize everything I say regarding the project roadmap."
"Transcribe the live radio stream at [URL] and provide the output in JSON format with timestamps."

Tips & Limitations

Efficiency: Always use the --quiet flag when running this skill as part of an automated agent loop to keep logs clean and reduce unnecessary noise.
Format Flexibility: The tool supports a wide array of formats, including most major audio and video containers (MP4, WAV, MOV, etc.), making it robust for mixed-media projects.
Hard Limits: Ensure your files remain under the 3GB limit and 10-hour duration cap per task. For extremely long files, consider splitting them prior to processing.
Accuracy: For non-English audio, always explicitly provide the language code using the --lang flag to maximize transcription precision.

elevenlabs-transcribe

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)