What This Skill Does

The video-analyzer skill provides the OpenClaw agent with the ability to interpret video files by breaking them down into digestible static images. By leveraging ffmpeg, the tool extracts frames at a configurable temporal resolution, allowing the agent to "see" the progression of a video file. Once frames are extracted, the agent processes these images sequentially to identify scene changes, extract text, monitor UI elements, or describe actions occurring within the video. This skill bridges the gap between raw binary video data and text-based semantic understanding, enabling tasks that require high-level comprehension of multimedia files without needing a native video player.

Installation

To utilize this skill, ensure that the ffmpeg dependency is present on your host machine. For Ubuntu/Debian systems, run sudo apt-get install -y ffmpeg. On macOS, use brew install ffmpeg. Once the system-level dependency is satisfied, install the skill via the OpenClaw terminal using: clawhub install openclaw/skills/skills/kartinw/video-watcher.

Use Cases

This skill is indispensable for professionals who need to audit large volumes of video data. Common use cases include: summarizing meeting recordings, extracting specific data points from training tutorials, analyzing product demonstration videos for UI inconsistencies, or documenting events across long surveillance footage. It is particularly effective when you need to confirm if a specific event occurred or when you need to archive the visual contents of a video into a searchable, text-based log.

Example Prompts

"Analyze the provided tutorial video located at ./downloads/setup.mp4 and tell me which menu the user clicks on after the welcome screen."
"Extract frames from ./data/recording.mov at 1 FPS and generate a summary report of the key milestones observed throughout the video."
"Look at the video file ./project/build.mp4 and describe the changes in the UI between the first and the last frame."

Tips & Limitations

For optimal performance, adapt your sampling strategy to the length of the video. Short videos under 60 seconds are best analyzed frame-by-frame, whereas long-form content benefits from lower sampling rates (e.g., 1 frame per 10 seconds) to avoid context window overflow. Always check for adequate storage space before processing large files, as frame extraction creates multiple image assets. Note that the accuracy of the analysis is dependent on the visual clarity of the frames and the agent's ability to interpret image data accurately.

video-analyzer

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)