Official Verified media Safety 4/5

video-understanding

Analyze videos with Google Gemini multimodal AI. Download from any URL (Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, 1000+ sites) and get transcripts, descriptions, and answers to questions. Use when asked to watch, analyze, summarize, or transcribe a video, or answer questions about video content. Triggers on video URLs or requests involving video understanding.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bill492/video-understanding

Download Source Code (.zip)

What This Skill Does

The video-understanding skill leverages the power of Google Gemini's multimodal AI to provide deep insights into video content across the web. Whether you are dealing with a long-form YouTube tutorial, a brief social media clip from TikTok, or a professional demonstration on Loom, this skill acts as your personal video analyst. It automates the complex pipeline of downloading, processing, and interpreting video data, returning a clean, structured JSON response that includes a detailed transcript with timestamps, a comprehensive visual description, a concise summary, and speaker identification.

Installation

To integrate this skill into your OpenClaw agent, execute the following command in your terminal: clawhub install openclaw/skills/skills/bill492/video-understanding

Ensure that you have yt-dlp and ffmpeg installed on your system (e.g., via brew install yt-dlp ffmpeg). Additionally, you must provide a valid GEMINI_API_KEY as an environment variable to authorize the connection to Google's multimodal AI models.

Use Cases

Content Repurposing: Generate written blog posts or social media copy from recorded video meetings.
Learning & Research: Quickly extract key takeaways or answers to specific questions from educational videos without watching the entire duration.
Content Moderation/Compliance: Identify visual elements, UI patterns, or speakers within a video library.
Accessibility: Create automated transcripts and visual descriptions for archived media that lacks metadata.

Example Prompts

"Watch this YouTube tutorial on Python decorators and give me a 3-sentence summary of the main takeaway."
"Can you watch this Loom video and list every step the user took in the settings menu?"
"Transcribe this video from Twitter and identify all the speakers mentioned in the conversation."

Tips & Limitations

YouTube Efficiency: The skill is optimized for YouTube; it avoids the download step and passes the URL directly to Gemini for instant processing.
Handling Large Files: The Gemini File API supports large video files, but please monitor your internet connection speed when uploading massive files for analysis.
Customization: If the default JSON output is too verbose, use the -p flag to override the prompt and get concise, raw text tailored to your specific needs.
Cost Considerations: Since this relies on Gemini's API, ensure your account has sufficient quota if you intend to process a high volume of long-duration videos.

Read Full Documentation on GitHub

Metadata

Author@bill492

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bill492-video-understanding": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#video#gemini#transcription#multimodal#ai-analysis

Safety Score: 4/5

Flags: network-access, file-write, file-read, external-api

Related Skills

skill-audit

Audit all installed skills for quality, duplicates, structural issues, and best-practice compliance. Use when asked to review, audit, lint, or check skills for problems. Triggers on "audit skills", "skill quality", "check my skills", "skill duplicates", "skill hygiene".

bill492 4473

browser-read-x

Extract the main X/Twitter post or article content from a page that is already open in the browser (using browser act evaluate).

bill492 4473

cf-crawl

Crawl websites using Cloudflare Browser Rendering /crawl API. Async multi-page crawl with markdown/HTML/JSON output, link following, pattern filtering, and AI-powered structured data extraction. Use when crawling entire sites or multiple pages, building knowledge bases, extracting structured data from websites, or when web_fetch is insufficient (JS rendering, multi-page, authenticated crawls).

bill492 4473

sub-agents

Spawn and coordinate sub-agent sessions for parallel work. Use when delegating tasks (research, code, analysis), routing to appropriate models, or managing multi-agent workflows. Trigger on "spawn", "sub-agent", "delegate", "parallel tasks", or when a task would benefit from a different model.

bill492 4473

browser-read

Extract readable content from browser pages as markdown. Use when web_fetch fails (bot protection, auth-required pages, Twitter/X, LinkedIn) and you already have the page open in the browser.

bill492 4473