Official Verified

YouTube Model Feeder

Food for your model — extract transcripts, key frames, OCR, slides, and LLM summaries from YouTube videos into structured AI-ready knowledge.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/celstnblacc/youtube-model-feeder

Download Source Code (.zip)

YouTube Model Feeder

Food for your model.

Stop pausing videos every 30 seconds to screenshot, paste into Obsidian, and caption. Every 20-minute tutorial shouldn't take an hour to document.

YouTube Model Feeder extracts everything from a YouTube video — timestamped transcript, key frame snapshots, OCR of code and slides, presentation slide detection, and LLM-generated summaries — and packages it into structured knowledge your AI assistant can search, reference, and reason about.

Why This Exists

The problem isn't transcription — ten tools do that. The problem is structured context. When you feed a raw transcript to a model, it has no visual context. It doesn't know what was on screen when the speaker said "as you can see here." It can't read the code in the terminal, the diagram on the slide, or the config file being edited.

YouTube Model Feeder captures all of that. The output isn't just text — it's a knowledge bundle: transcript segments aligned to timestamps, screenshots of every key moment, OCR text from code snippets and slides, and an LLM summary that ties it all together.

Combined with obsidian-semantic-search (also on ClawHub), every video you watch becomes permanently searchable by meaning in your Obsidian vault.

What It Extracts

Full Pipeline

Step	Tool	What it produces
Download	yt-dlp	Video + audio + metadata (title, duration, thumbnail)
Transcribe	Whisper (Ollama) or YouTube captions	Timestamped transcript segments
Frame Extraction	FFmpeg	Key frame snapshots every 5s (configurable)
Slide Detection	SSIM analysis (OpenCV)	Identifies presentation slides via structural similarity between frames
OCR	Tesseract	Reads code, terminal output, and text from captured frames
LLM Summary	Ollama / OpenAI / Anthropic	Structured markdown with sections, code blocks, and key takeaways

Slide Detection (Deep)

Not just frame captures — intelligent slide boundary detection:

Layout detection — classifies video as full-frame, picture-in-picture, or split panel
SSIM transition scan — compares consecutive frames for structural changes (threshold: SSIM < 0.85)
LLM disambiguation — borderline transitions (0.85–0.93 SSIM) sent to LLM for classification
Slide grouping — merges transitions into slides with enforced minimum duration (3s)
Final-state capture — saves the last frame of each slide as JPEG
OCR extraction — runs Tesseract on each slide image
Transcript alignment — maps transcript segments to slide time ranges

Output Formats

Read Full Documentation on GitHub

Metadata

Author@celstnblacc

Stars3951

Updated2026-04-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-celstnblacc-youtube-model-feeder": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

Git Security Scanner

Unified security scanner that catches leaked secrets, credentials, and code vulnerabilities before they reach your remote. Wraps gitleaks (400+ secret patterns) and shipguard (48+ SAST rules) into a single tool with pre-commit hooks, on-demand scans, and full git history audits.

celstnblacc 4017

Obsidian Semantic Search

Semantic search across your Obsidian vaults using local embeddings (Ollama + pgvector). 10 MCP tools: hybrid/semantic/keyword search, file CRUD, batch reads, live re-indexing, and a monitoring dashboard. Fully local — no API keys, no cloud, zero cost.

celstnblacc 3951