multimodal-memory
Remember and retrieve visual content from conversations. Use when: (1) user sends an image, photo, chart, diagram, or screenshot and wants it saved/remembered; (2) user asks to capture or remember a website, URL, or web page UI; (3) user asks what you've seen before, wants to recall a past image, or searches visual memories; (4) user sends an image to find similar past content.
Why use this skill?
Enhance OpenClaw with multimodal-memory to store, index, and retrieve images, charts, and website screenshots using advanced visual analysis.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/horisky/minds-eyeWhat This Skill Does
The multimodal-memory skill transforms your OpenClaw agent into a visual archive. It enables the system to store, categorize, and retrieve visual information including user-provided photos, analytical charts, technical diagrams, and website screenshots. By offloading image analysis to a dedicated GPT-4o-backed engine, this skill ensures consistent and accurate interpretation of visual data, which is then indexed in a local SQLite database and a readable markdown log, allowing you to search through your visual history using natural language queries.
Installation
To install this skill, use the following command in your terminal: clawhub install openclaw/skills/skills/horisky/minds-eye
If you intend to capture website UIs, ensure you have the necessary dependencies for the web capture engine: pip install playwright && python -m playwright install chromium
Use Cases
- Research & Documentation: Quickly save technical diagrams or architectural charts for future reference during deep-dive projects.
- Web Development & UI/UX: Capture and compare different iterations of a website's interface to track design changes over time.
- Visual Knowledge Base: Maintain a searchable index of screenshots, allowing you to ask the agent to recall specific UI elements or information from past conversations.
- Similarity Search: Identify patterns by finding past images or charts that resemble a newly provided file.
Example Prompts
- "I'm uploading a diagram of our network architecture. Please save this so I can refer back to it later."
- "Can you take a screenshot of https://openclaw.org and save it to my visual memory?"
- "Show me all the dark mode login screens I've saved in the past month."
Tips & Limitations
- Always Use the Engine: Never attempt to describe images yourself; the skill is designed to delegate analysis to
analyze.py, which is calibrated for the specific multimodal requirements of this storage system. - Path Accuracy: Always provide or request absolute file paths to ensure the Python scripts can correctly locate the local assets for indexing.
- Storage Management: The system stores data in
~/.multimodal-memory/. Periodically checkmemory.mdto see the current state of your visual knowledge base. - Privacy: Note that all processed images are stored locally on your machine. Ensure your local storage is secured appropriately.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-horisky-minds-eye": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, external-api, code-execution