ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified productivity Safety 4/5

multimodal-memory

Remember and retrieve visual content from conversations. Use when: (1) user sends an image, photo, chart, diagram, or screenshot and wants it saved/remembered; (2) user asks to capture or remember a website, URL, or web page UI; (3) user asks what you've seen before, wants to recall a past image, or searches visual memories; (4) user sends an image to find similar past content.

Why use this skill?

Enhance OpenClaw with multimodal-memory to store, index, and retrieve images, charts, and website screenshots using advanced visual analysis.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/horisky/minds-eye
Or

What This Skill Does

The multimodal-memory skill transforms your OpenClaw agent into a visual archive. It enables the system to store, categorize, and retrieve visual information including user-provided photos, analytical charts, technical diagrams, and website screenshots. By offloading image analysis to a dedicated GPT-4o-backed engine, this skill ensures consistent and accurate interpretation of visual data, which is then indexed in a local SQLite database and a readable markdown log, allowing you to search through your visual history using natural language queries.

Installation

To install this skill, use the following command in your terminal: clawhub install openclaw/skills/skills/horisky/minds-eye

If you intend to capture website UIs, ensure you have the necessary dependencies for the web capture engine: pip install playwright && python -m playwright install chromium

Use Cases

  • Research & Documentation: Quickly save technical diagrams or architectural charts for future reference during deep-dive projects.
  • Web Development & UI/UX: Capture and compare different iterations of a website's interface to track design changes over time.
  • Visual Knowledge Base: Maintain a searchable index of screenshots, allowing you to ask the agent to recall specific UI elements or information from past conversations.
  • Similarity Search: Identify patterns by finding past images or charts that resemble a newly provided file.

Example Prompts

  1. "I'm uploading a diagram of our network architecture. Please save this so I can refer back to it later."
  2. "Can you take a screenshot of https://openclaw.org and save it to my visual memory?"
  3. "Show me all the dark mode login screens I've saved in the past month."

Tips & Limitations

  • Always Use the Engine: Never attempt to describe images yourself; the skill is designed to delegate analysis to analyze.py, which is calibrated for the specific multimodal requirements of this storage system.
  • Path Accuracy: Always provide or request absolute file paths to ensure the Python scripts can correctly locate the local assets for indexing.
  • Storage Management: The system stores data in ~/.multimodal-memory/. Periodically check memory.md to see the current state of your visual knowledge base.
  • Privacy: Note that all processed images are stored locally on your machine. Ensure your local storage is secured appropriately.

Metadata

Author@horisky
Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-horisky-minds-eye": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#computer-vision#memory#image-analysis#storage#archiving
Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution