openocr-skills
Extract text from images, documents and scanned PDFs using OpenOCR - a lightweight and efficient OCR system with document parsing model requiring only 0.1B parameters, capable of running recognition on personal PCs. Supports text detection, recognition, universal VLM recognition, and document parsing with layout analysis
Why use this skill?
Efficiently extract text, formulas, and tables from images and PDFs using OpenOCR. Lightweight, accurate, and optimized for local document parsing and layout analysis.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/topdu/openocr-skillWhat This Skill Does
The OpenOCR skill provides an advanced, lightweight, and highly efficient optical character recognition engine for OpenClaw. Built on a sophisticated 0.1B parameter model, this skill enables AI agents to read, interpret, and structure data from a wide variety of visual sources, including scanned documents, photographs of text, complex mathematical formulas, and tabular data. Unlike heavy, cloud-dependent OCR systems, OpenOCR is designed to run efficiently on local hardware while maintaining enterprise-grade accuracy. It supports specialized workflows ranging from simple text detection to full document layout analysis, converting unstructured image data into actionable text or structured Markdown formats. By integrating layout analysis with universal recognition, the skill acts as a bridge between physical media and digital intelligence.
Installation
To integrate OpenOCR into your agent's capability set, use the OpenClaw command-line interface. Run the following command in your terminal:
clawhub install openclaw/skills/skills/topdu/openocr-skill
Ensure that you have the necessary system dependencies and Python environment set up as outlined in the OpenClaw core documentation. Once installed, the skill will be automatically registered to your agent's registry.
Use Cases
- Digitization & Archiving: Convert legacy paper records, invoices, and physical receipts into searchable digital databases or spreadsheets.
- Academic & Research Assistance: Extract complex mathematical equations and technical formulas from screenshots or research papers to perform further symbolic computation.
- Document Structuring: Perform layout analysis on multi-column documents or forms, preserving the logical order of text and data tables.
- Data Extraction: Automate the entry of information from identity cards, labels, or product packaging into CRM or ERP systems.
Example Prompts
- "Extract all text from this receipt image and format it as a markdown table with date, merchant, and total amount."
- "Analyze this PDF page and perform layout analysis to extract the mathematical formulas and text content separately."
- "Scan this photo of a handwritten note and convert it into a clean, digital text document."
Tips & Limitations
- Performance: While the model is optimized for personal PCs, high-resolution scans or extremely long documents may benefit from GPU acceleration; ensure your
use_gpuflag is set to 'auto' for best results. - Pre-processing: For blurry or low-light images, consider running basic image enhancement or cropping to the text region before calling the 'det' (detection) task to improve recognition accuracy.
- Complexity: The 'doc' task is highly powerful for layout analysis but requires more memory than the 'rec' (text recognition only) task. Use specific tasks rather than defaulting to complex ones when you only need simple text reading.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-topdu-openocr-skill": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: file-read
Related Skills
comparison-table-gen
Auto-generates comparison tables for concepts, drugs, or study results in Markdown format.
AB-Agents-Vision-MiniMax
👁️ Image analysis via MiniMax VL API. Describe images, extract text from screenshots, analyze photos. Requires MiniMax Token Plan API key (free tier available).
AB-Agents-Vision
👁️ Image analysis using MiniMax VL API. Describe images, extract text from screenshots, analyze photos. Works with local files and URLs. Simple shell wrapper.
DocPilot
智能文档处理专家,支持文档解析、信息抽取、文档分类
DocPilot
智能文档处理专家,支持文档解析、信息抽取、文档分类