ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

elevenlabs-transcribe

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

Why use this skill?

Easily transcribe audio files, live streams, and microphone input with the ElevenLabs OpenClaw skill. Supports 90+ languages and diarization.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/paulasjes/elevenlabs-transcribe
Or

What This Skill Does

The elevenlabs-transcribe skill is a high-performance speech-to-text integration designed for the OpenClaw agent ecosystem. Leveraging ElevenLabs' state-of-the-art transcription engine, it enables agents to convert spoken language into high-accuracy text. The tool is remarkably versatile, handling batch processing of local audio files, real-time streaming from live URLs, and direct microphone input. It supports over 90 languages and includes advanced features like speaker diarization (differentiating between speakers), event tagging (e.g., laughter or music), and granular JSON outputs containing word-level timestamps, making it an ideal choice for both transcription tasks and complex agent workflows requiring auditory input analysis.

Installation

To integrate this skill into your environment, ensure you have ffmpeg installed globally (e.g., via brew install ffmpeg). Set your ElevenLabs API key as an environment variable named ELEVENLABS_API_KEY. Once configured, execute the following command in your terminal: clawhub install openclaw/skills/skills/paulasjes/elevenlabs-transcribe The skill will automatically handle Python dependency installation on its first execution, ensuring a seamless setup process.

Use Cases

  • Meeting Intelligence: Automatically transcribe recorded video conferencing calls with speaker diarization to generate meeting summaries.
  • Content Creation: Quickly convert raw interview audio or podcast recordings into production-ready transcripts.
  • Live Monitoring: Stream and monitor live audio feeds (radio or web streams) for specific keywords or data points.
  • Agent Voice Interaction: Enable your OpenClaw agent to "listen" to your microphone, allowing you to give voice commands instead of typing prompts.
  • Archive Indexing: Process large libraries of local media files into searchable text formats.

Example Prompts

  1. "Transcribe the meeting recording 'client_call.mp3' and make sure to identify who said what using speaker diarization."
  2. "Listen to my microphone for the next minute and summarize everything I say regarding the project roadmap."
  3. "Transcribe the live radio stream at [URL] and provide the output in JSON format with timestamps."

Tips & Limitations

  • Efficiency: Always use the --quiet flag when running this skill as part of an automated agent loop to keep logs clean and reduce unnecessary noise.
  • Format Flexibility: The tool supports a wide array of formats, including most major audio and video containers (MP4, WAV, MOV, etc.), making it robust for mixed-media projects.
  • Hard Limits: Ensure your files remain under the 3GB limit and 10-hour duration cap per task. For extremely long files, consider splitting them prior to processing.
  • Accuracy: For non-English audio, always explicitly provide the language code using the --lang flag to maximize transcription precision.

Metadata

Author@paulasjes
Stars1217
Views3
Updated2026-02-20
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-paulasjes-elevenlabs-transcribe": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#speech-to-text#elevenlabs#audio#ai-agent
Safety Score: 4/5

Flags: file-read, external-api