ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

youtube-transcription-generator

Use VLM Run (vlmrun) to generate transcriptions from YouTube videos. Download a video with yt-dlp, then run vlmrun to transcribe with optional timestamps. VLMRUN_API_KEY must be in .env; follow vlmrun-cli-skill for CLI setup and options.

Why use this skill?

Convert YouTube videos to text using VLM Run and yt-dlp. Generate accurate transcripts with optional timestamps for productivity and analysis.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/mehediahamed/youtube-transcription-generator
Or

What This Skill Does

The YouTube Transcription Generator is a powerful agent skill designed to bridge the gap between video content and textual analysis. By leveraging the advanced capabilities of the VLM Run (Orion visual AI) engine and the robust downloading utility yt-dlp, this skill allows users to transform long-form YouTube videos into structured, readable transcripts. The process is seamless: the skill automatically handles the retrieval of video assets from YouTube and orchestrates the transcription process through vlmrun. Users can opt for plain text output or request detailed timestamps, making it an ideal tool for summarizing lectures, meetings, or educational content. The workflow is fully encapsulated, meaning the agent handles the heavy lifting of file management and API interaction once a URL is provided.

Installation

To get started, ensure you are using Python 3.10 or higher. Navigate to the skill directory within your OpenClaw environment and create a virtual environment using uv venv. Activate the environment and install the required dependencies with uv pip install -r requirements.txt. You must obtain an API key from VLM Run and place it inside a .env file, adhering to the provided .env_template. Ensure that both yt-dlp and vlmrun CLI tools are correctly configured in your system path.

Use Cases

  • Academic Research: Transcribe hour-long educational lectures into searchable text documents to extract key concepts.
  • Content Creation: Generate accurate transcriptions of your own videos for closed captioning, blog post repurposing, or metadata optimization.
  • Business Intelligence: Quickly process meeting recordings or product demonstration videos hosted on YouTube to identify action items or customer feedback.
  • Accessibility: Create accessible text versions of visual-only media for users who prefer reading or require assistive technology.

Example Prompts

  1. "Please transcribe the YouTube video at https://www.youtube.com/watch?v=dQw4w9WgXcQ and save the output to my documents folder."
  2. "Can you watch this tech tutorial video https://www.youtube.com/watch?v=example and give me a full transcript with timestamps for every major step?"
  3. "Transcribe this lecture video and format it as a clean text file so I can easily summarize it later."

Tips & Limitations

  • API Usage: Be aware that long videos consume more tokens and may incur higher costs through the VLM Run API.
  • Performance: For very long videos, consider downloading only the audio track via yt-dlp to reduce bandwidth and processing time if the visual component is unnecessary.
  • Environment: Always verify your VLMRUN_API_KEY is correctly set in your environment variables if you experience authentication errors during execution.
  • Formatting: While timestamps are optional, requesting them helps improve the structural integrity of complex videos with multiple topics.

Metadata

Stars1401
Views0
Updated2026-02-24
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-mehediahamed-youtube-transcription-generator": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#youtube#vlmrun#speech-to-text#automation
Safety Score: 4/5

Flags: network-access, file-write, file-read, external-api