captions
Extract closed captions and subtitles from YouTube videos. Use when the user asks for captions, closed captions, CC, accessibility text, or wants to read what was said in a video. Supports timestamps and multiple languages. Great for deaf/HoH accessibility, content review, quoting, and translation.
Why use this skill?
Easily extract closed captions and transcripts from any YouTube video with the OpenClaw captions skill. Supports timestamps, metadata, and JSON formats for improved accessibility and content research.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/therohitdas/captionsWhat This Skill Does
The captions skill provides OpenClaw with the ability to programmatically extract closed captions and subtitle data directly from YouTube videos using the TranscriptAPI. This skill is designed to bridge the gap between video content and text-based analysis, enabling users to interact with video transcriptions without having to manually watch or transcribe content. It supports multiple languages, provides optional time-stamping for precise synchronization, and returns data in either structured JSON for developer workflows or formatted text for human readability. By integrating this into OpenClaw, you can transform video media into queryable text assets instantly.
Installation
To integrate this skill into your environment, run the installation command: clawhub install openclaw/skills/skills/therohitdas/captions. Upon installation, the system will check for the TRANSCRIPT_API_KEY. If missing, the agent will guide you through a secure authentication flow: first requesting your email, then verifying a 6-digit OTP to generate your credentials. These keys are stored in ~/.openclaw/openclaw.json with an automatic backup feature to ensure configuration integrity.
Use Cases
- Accessibility: Empowering deaf and hard-of-hearing users by providing full, searchable text alternatives to video content.
- Content Review: Quickly summarizing long-form YouTube videos, lectures, or interviews without watching the entire duration.
- Translation & Research: Extracting raw transcript data for use in secondary translation workflows or linguistic analysis.
- Quoting: Precisely citing specific dialogue from videos using the provided duration and start-time metadata.
Example Prompts
- "Can you give me the full transcript for the video at [YouTube URL] so I can read it instead of watching?"
- "What exactly did the speaker say in the last 30 seconds of this video? Please include timestamps."
- "Summarize the key points from this YouTube video by extracting the captions, and let me know if there are any mentions of AI ethics."
Tips & Limitations
For technical integrations like building a video-to-text overlay, always use format=json to retrieve the start and duration keys, which are essential for perfectly synchronizing your application. If you just need a clean reading experience for a quick review, stick to format=text with include_timestamp=false. Keep in mind that while auto-generated captions are available for the vast majority of videos, their accuracy depends on YouTube's internal speech-to-text engine; manual captions uploaded by video creators will consistently provide higher accuracy and better punctuation.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-therohitdas-captions": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, external-api, network-access
Related Skills
slack-personal
Read, send, search, and manage Slack messages and DMs via the slk CLI. Use when the user asks to check Slack, read channels or DMs, send Slack messages, search Slack, check unreads, manage drafts, view saved items, or interact with Slack workspace. Also use for heartbeat Slack checks. Triggers on "check slack", "any slack messages", "send on slack", "slack unreads", "search slack", "slack threads", "draft on slack", "read slack dms", "message on slack".
video-transcript
Extract full transcripts from video content for analysis, summarization, note-taking, or research. Use when the user wants a written version of video content, asks to "transcribe this", "get the text from this video", "convert video to text", or shares a video URL for content extraction.
transcriptapi
Full TranscriptAPI toolkit — fetch YouTube transcripts, search videos and channels, browse channel uploads, get latest videos, and explore playlists. Use when the user wants to work with YouTube content programmatically, get transcripts for summarization or analysis, find videos, or monitor channels. Triggers on YouTube URLs, "transcript", "transcriptapi", "video summary", "what did they say", "find videos about", "search youtube".
youtube-data
Access YouTube video data — transcripts, metadata, channel info, search, and playlists. A lightweight alternative to Google's YouTube Data API with no quota limits. Use when the user needs structured data from YouTube videos, channels, or playlists without dealing with Google API setup, OAuth, or daily quotas.
youtube-full
Complete YouTube toolkit — transcripts, search, channels, playlists, and metadata all in one skill. Use when you need comprehensive YouTube access, want to search and then get transcripts, browse channel content, work with playlists, or need the full suite of YouTube data endpoints. The all-in-one YouTube skill for agents.