TubeScribe
YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.
Why use this skill?
Transform any YouTube video into a polished document with speaker detection, timestamps, and audio summaries. Free, local, and private processing.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/matusvojtek/tubescribeWhat This Skill Does
TubeScribe transforms any YouTube video into an accessible, structured knowledge base. It acts as an automated researcher that handles transcription, speaker diarization, and summary generation entirely on your local machine. By leveraging YouTube's metadata and captions, it generates high-quality text documents—available in DOCX, HTML, or Markdown—paired with audio summaries for portable listening. It is designed to handle long-form content like interviews, educational lectures, and news, making them searchable and easier to digest without needing to watch the video from start to finish.
Installation
Ensure your system has Python 3 installed. Run the setup script to verify all local dependencies:
python skills/tubescribe/scripts/setup.py
This script checks for essential tools including pandoc for document formatting, ffmpeg for media handling, and the Kokoro TTS engine for generating audio summaries. You can also install the skill package directly using the OpenClaw hub command: clawhub install openclaw/skills/skills/matusvojtek/tubescribe.
Use Cases
- Academic Research: Quickly summarize long university lectures and extract key timestamps for specific topics.
- Content Curation: Generate concise written reports from hours-long podcast interviews for later review.
- Accessibility: Create audio summaries for users who prefer listening to content rather than watching visual media.
- Corporate Meetings: Process recorded YouTube meetings or webinars into formatted documents with identified speakers and key quotes.
Example Prompts
- "Summarize this video and give me the key points: [YouTube URL]"
- "Create a transcript of this interview, label the speakers, and generate an audio summary for my commute: [YouTube URL]"
- "Can you watch this lecture for me, write a document with clickable timestamps, and extract the main quotes? [YouTube URL]"
Tips & Limitations
- Non-Blocking Workflow: TubeScribe runs as a sub-agent. Feel free to continue chatting with OpenClaw while the transcription processes in the background.
- Privacy First: Because all processing happens locally, your data is never sent to external servers for transcription or summarization.
- Limitations: The skill requires internet access to fetch video metadata and captions. Ensure you are connected to the network when requesting a new summary. Results depend on the quality of the video's original captions.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-matusvojtek-tubescribe": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, code-execution
Related Skills
Briefing Room
Daily news briefing generator — produces a conversational radio-host-style audio briefing + DOCX document covering weather, X/Twitter trends, web trends, world news, politics, tech, local news, sports, markets, and crypto. macOS only (uses Apple TTS and afplay). Use when user asks for a news briefing, morning briefing, daily update, or similar.
Her Voice
Give your agent a voice. Use when the user wants the agent to speak, read aloud, or have voice responses.