youtube-transcriber
One-command YouTube video transcription. Automatically downloads audio and transcribes using OpenAI Whisper API — works even when YouTube subtitles are disabled. Use when asked to "transcribe this video", "get transcript", "what does this video say", or when YouTube captions are unavailable.
Why use this skill?
Convert any YouTube video to text instantly. Uses OpenAI Whisper for accurate transcripts even when captions are disabled. Install today.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/edisonchenai/youtube-transcriberWhat This Skill Does
The YouTube Transcriber skill provides a robust, one-command solution for converting any YouTube video into text. Whether you need to extract information from a lecture, summarize a long-form video, or simply have a reference copy of spoken dialogue, this tool streamlines the process. It is highly intelligent: it first checks for native YouTube subtitles to save you time and cost. If those are missing or disabled, it seamlessly pivots to downloading the audio and processing it through the OpenAI Whisper API to generate an accurate transcript. This ensures you get high-quality results for 99+ languages, regardless of the video's original accessibility settings.
Installation
To get started, ensure you have the necessary system-level dependencies installed. The script relies on yt-dlp for video/audio extraction and ffmpeg for audio conversion, both of which can be installed via Homebrew (brew install yt-dlp ffmpeg) or pip/package managers. You must also have your OPENAI_API_KEY exported as an environment variable in your terminal session or shell configuration file. Once prerequisites are met, install the skill via the OpenClaw hub: clawhub install openclaw/skills/skills/edisonchenai/youtube-transcriber.
Use Cases
- Academic Research: Extracting transcripts from educational videos or webinars for study notes.
- Content Creation: Repurposing video content into blog posts or newsletters.
- Accessibility: Creating text transcripts for hearing-impaired users or for SEO indexing purposes.
- Efficiency: Quickly searching through lengthy video content by keywords without watching the entire video.
Example Prompts
- "Transcribe this video for me: https://www.youtube.com/watch?v=dQw4w9WgXcQ"
- "Can you give me a full transcript of the video I just sent, and please save it to my notes folder?"
- "What does the speaker say in this clip? Please provide a high-accuracy transcript using the Whisper API."
Tips & Limitations
- Cost Efficiency: Always leverage native subtitles when possible, as they are free. Use the
--force-whisperflag only when high accuracy is required or native captions are insufficient. - API Limits: Keep in mind that Whisper API usage costs roughly $0.006 per minute. Long videos may take slightly longer to process as audio is compressed to fit the 25MB upload constraint.
- Maintenance: If you receive a 403 error, run
pip install -U yt-dlpimmediately to update your extraction engine, as YouTube frequently updates their anti-scraping measures.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-edisonchenai-youtube-transcriber": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api
Related Skills
reddit-assistant
Reddit content creation assistant for indie developers and product builders. Creates authentic posts, researches communities, tracks real performance data via Reddit API. Triggers on: "write reddit post", "draft reddit", "post to reddit", "reddit content", "find subreddits for", "which subreddits", "check reddit performance", "reddit analytics", "reddit results", "log reddit post", "reddit post ideas", "reddit strategy"
protea-Self-evolving life agent
Self-evolving artificial life agent. Three-ring architecture: Ring 0 (Sentinel) supervises, Ring 1 (Intelligence) drives LLM-powered evolution, Ring 2 (Evolvable Code) is the living program that self-restructures, self-reproduces, and self-evolves. Supports Anthropic, OpenAI, DeepSeek, and Qwen as LLM providers. Includes fitness scoring, gene pool inheritance, tiered memory, skill crystallization, Telegram bot, and web dashboard.
Edison Autopilot Post X
Skill by edisonchenai
edison-youtube-full
Complete YouTube toolkit for agents: search videos, fetch metadata, browse channels and playlists, and pull transcripts. Use when you need comprehensive YouTube Data API access (search, channels, playlists) plus transcript extraction in a single workflow.
edison-agent-reach
Use the internet: search, read, and interact with 13+ platforms including Twitter/X, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu (小红书), Douyin (抖音), WeChat Articles (微信公众号), LinkedIn, Boss直聘, RSS, Exa web search, and any web page. Use when: (1) user asks to search or read any of these platforms, (2) user shares a URL from any supported platform, (3) user asks to search the web, find information online, or research a topic, (4) user asks to post, comment, or interact on supported platforms, (5) user asks to configure or set up a platform channel. Triggers: "搜推特", "搜小红书", "看视频", "搜一下", "上网搜", "帮我查", "全网搜索", "search twitter", "read tweet", "youtube transcript", "search reddit", "read this link", "看这个链接", "B站", "bilibili", "抖音视频", "微信文章", "公众号", "LinkedIn", "GitHub issue", "RSS", "search online", "web search", "find information", "research", "帮我配", "configure twitter", "configure proxy", "帮我安装".