Official Verified media Safety 4/5

gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/aiwithabidi/gemini-video-analyzer

Download Source Code (.zip)

What This Skill Does

The Gemini Video Analyzer is a sophisticated multimodal agent skill designed to interpret and extract intelligence from video files without the need for manual frame extraction. By leveraging Google's Gemini API, this tool processes videos at a rate of 1 frame per second, capturing temporal context, motion, audio streams, and visual data simultaneously. Whether you are dealing with screen recordings, instructional walkthroughs, or creative media, the skill translates visual information into actionable text, structured summaries, or specific answers, effectively turning raw video files into queryable data.

Installation

To integrate this skill into your OpenClaw environment, use the command-line interface to pull it from the official repository:

clawhub install openclaw/skills/skills/aiwithabidi/gemini-video-analyzer

Once installed, you must configure your API authentication. Generate a free API key at aistudio.google.com and export it as an environment variable in your shell or define it within your .env file:

export GOOGLE_AI_API_KEY="your_key_here"

The tool is now ready to process supported video formats including MP4, AVI, MOV, and WebM, with file size support up to 2GB.

Use Cases

Quality Assurance: Analyze bug report screen recordings to identify visual anomalies or UI crashes.
Educational Research: Automatically generate step-by-step summaries from complex software tutorials or instructional demonstrations.
Content Archiving: Transcribe spoken audio and index key visual elements from meeting recordings or lecture videos.
User Experience Testing: Extract UI element changes or interaction flows from product demos for design comparison.

Example Prompts

"Can you watch this screen recording and provide a bulleted list of the exact steps the user took to trigger the login error?"
"Analyze this product demo video and summarize the three most important features mentioned by the speaker."
"Review this video of a UI walkthrough and list all the buttons and text labels visible in the main navigation menu."

Tips & Limitations

Model Selection: Use the default gemini-2.5-flash for general, high-speed tasks. For highly complex visual logic or dense technical content, use the --model gemini-2.5-pro flag.
Data Privacy: Files are uploaded to Google's Files API and are automatically deleted after 48 hours.
Limitations: While the model excels at temporal understanding, extremely long videos may require more tokens; consider splitting files if you encounter length constraints.

Read Full Documentation on GitHub

Metadata

Author@aiwithabidi

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-aiwithabidi-gemini-video-analyzer": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#video-analysis#multimodal#gemini#computer-vision#ai-automation

Safety Score: 4/5

Flags: file-read, external-api

Related Skills

onepassword

1Password Connect — vaults, items, secrets management for server-side applications.

aiwithabidi 4473

freshsales

Freshsales CRM integration — manage contacts, leads, deals, accounts, tasks, and sales sequences via the Freshsales API. Track deal pipelines, automate lead assignments, log activities, and generate sales reports. Built for AI agents — Python stdlib only, no dependencies. Use for sales CRM, contact management, deal tracking, pipeline reporting, and sales automation.

aiwithabidi 4473

agent-memory

Full AI agent memory stack — Mem0 unified memory engine with vector search (Qdrant) and knowledge graph (Neo4j), plus SQLite for structured data. Complete setup script and tools. Give your OpenClaw agent a real brain with semantic recall, entity relationships, and structured storage.

aiwithabidi 4473

neon

Neon serverless Postgres — manage projects, branches, databases, roles, endpoints, and compute via the Neon API. Create database branches for development, manage connection endpoints, scale compute, and monitor usage. Built for AI agents — Python stdlib only, zero dependencies. Use for serverless Postgres, database branching, database management, development workflows, and cloud database automation.

aiwithabidi 4473

github-intel

Analyze any GitHub repository in AI-friendly format. Convert entire repos to single markdown documents, generate architecture diagrams with Mermaid, inspect structure trees, language breakdowns, and recent activity. Includes GitHub URL tricks, API shortcuts, and advanced search techniques. Read-only analysis — never executes code from repositories. Built for AI agents — Python stdlib only, no dependencies. Use for repository analysis, code architecture review, open source research, GitHub intelligence, repo documentation, and codebase understanding.

aiwithabidi 4473