transcribee
Transcribe YouTube videos and local audio/video files with speaker diarization. Use when user asks to transcribe a YouTube URL, podcast, video, or audio file. Outputs clean speaker-labeled transcripts ready for LLM analysis.
Why use this skill?
Convert audio and YouTube videos into speaker-labeled transcripts with OpenClaw. Features diarization, word-level timings, and automated metadata extraction.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/itsfabioroma/transcribeeWhat This Skill Does
The Transcribee skill is a high-precision audio and video processing agent designed for OpenClaw. It serves as a bridge between raw media files—whether sourced from local storage or public YouTube links—and structured text suitable for Large Language Model (LLM) analysis. By utilizing ElevenLabs' advanced diarization technology, Transcribee distinguishes between multiple speakers in a recording, assigning clear labels so you can follow the flow of conversation. Beyond simple text conversion, the skill outputs a structured directory containing raw text, word-level timings in JSON format, and comprehensive metadata regarding the source media, ensuring that the resulting data is ready for archival, summarization, or deep research.
Installation
To integrate Transcribee into your OpenClaw environment, you must first ensure that the prerequisite system dependencies are met. Open a terminal and run the following commands to install the necessary media processing engines:
brew install yt-dlp ffmpeg
Once these are installed, you can add the skill to your agent via the OpenClaw command line interface:
clawhub install openclaw/skills/skills/itsfabioroma/transcribee
Ensure that you have an active API configuration set up in the .env file within the Transcribee directory to enable connection to the transcription engine.
Use Cases
Transcribee is an essential tool for content creators, researchers, and developers who need to parse audio information efficiently.
- Journalism and Research: Automatically transcribe long-form interviews or panel discussions with speaker diarization to maintain accuracy in quotes and attributions.
- Content Repurposing: Quickly turn YouTube video tutorials or podcasts into comprehensive blog posts, documentation, or newsletters by generating a speaker-labeled transcript.
- Meeting Analysis: Process recorded video meetings or voice notes to extract actionable insights, action items, and executive summaries using the outputted
.txtfiles. - Accessibility: Create accessible text versions of audio-visual media for better indexing or for viewers who prefer reading over watching.
Example Prompts
- "Transcribee, please process this interview: https://www.youtube.com/watch?v=sampleID. Make sure to capture both the interviewer and the guest accurately."
- "Hey, can you transcribe my meeting audio file located at ~/Documents/recordings/project_sync.mp3 and organize the files in the standard output folder?"
- "I need a breakdown of the video at this URL: "https://www.youtube.com/watch?v=video_link&feature=shared". Please extract the transcription so I can summarize the key points later."
Tips & Limitations
- URL Formatting: When dealing with YouTube links that contain tracking parameters (often denoted by
&), always wrap the URL in quotes to prevent your shell from misinterpreting the characters. - Quality Matters: The accuracy of the speaker diarization depends heavily on audio quality. Clear, high-fidelity audio will result in significantly fewer errors compared to background-heavy recordings.
- Storage: Remember that every transcription generates multiple files. Periodically clear your
~/Documents/transcripts/directory to manage storage effectively. - API Keys: If you encounter unexpected failures, double-check that your ElevenLabs API key is correctly configured in your environment variables, as the service requires valid authentication for every request.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-itsfabioroma-transcribee": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api