openocr-skills
Extract text from images, documents and scanned PDFs using OpenOCR - supports text detection, recognition, universal VLM recognition, and document parsing with layout analysis
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/topdu/opencr-skillOpenOCR Skill
Overview
This skill enables intelligent text extraction, document parsing, and universal recognition using OpenOCR - an accurate and efficient general OCR system. It provides a unified interface for text detection, text recognition, end-to-end OCR, VLM-based universal recognition (text/formulas/tables), and document parsing with layout analysis. Supports Chinese, English, and more.
How to Use
- Provide the image, scanned document, or PDF
- Optionally specify the task type (det/rec/ocr/unirec/doc)
- I'll extract text, formulas, tables, or full document structure
Example prompts:
- "Extract all text from this image"
- "Detect text regions in this photo"
- "Recognize the formula in this screenshot"
- "Parse this PDF document with layout analysis"
- "Convert this scanned page to Markdown"
Domain Knowledge
OpenOCR Fundamentals
from openocr import OpenOCR
# Initialize with a specific task
engine = OpenOCR(task='ocr')
# Run OCR on an image (callable interface)
results, time_dicts = engine(image_path='image.jpg')
# Results contain detected boxes with recognized text
for result in results:
for line in result:
box = line[0] # Bounding box coordinates
text = line[1][0] # Recognized text
conf = line[1][1] # Confidence score
print(f"{text} ({conf:.2f})")
Supported Tasks
# Available task types
tasks = {
'det': 'Text Detection - detect text regions with bounding boxes',
'rec': 'Text Recognition - recognize text from cropped images',
'ocr': 'End-to-End OCR - detection + recognition pipeline',
'unirec': 'Universal Recognition - VLM-based text/formula/table recognition (0.1B params)',
'doc': 'Document Parsing - layout analysis + universal recognition (0.1B params)',
}
# Task selection via parameter
det_engine = OpenOCR(task='det')
rec_engine = OpenOCR(task='rec')
ocr_engine = OpenOCR(task='ocr')
unirec_engine = OpenOCR(task='unirec')
doc_engine = OpenOCR(task='doc')
Configuration Options
from openocr import OpenOCR
# === Text Detection ===
detector = OpenOCR(
task='det',
backend='onnx', # 'onnx' (default) or 'torch'
onnx_det_model_path=None, # Custom detection model (auto-downloads if None)
use_gpu='auto', # 'auto', 'true', or 'false'
)
# === Text Recognition ===
recognizer = OpenOCR(
task='rec',
mode='mobile', # 'mobile' (fast) or 'server' (accurate)
backend='onnx', # 'onnx' (default) or 'torch'
onnx_rec_model_path=None, # Custom recognition model
use_gpu='auto',
)
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-topdu-opencr-skill": {
"enabled": true,
"auto_update": true
}
}
}Tags
Related Skills
comparison-table-gen
Auto-generates comparison tables for concepts, drugs, or study results in Markdown format.
AB-Agents-Vision-MiniMax
👁️ Image analysis via MiniMax VL API. Describe images, extract text from screenshots, analyze photos. Requires MiniMax Token Plan API key (free tier available).
AB-Agents-Vision
👁️ Image analysis using MiniMax VL API. Describe images, extract text from screenshots, analyze photos. Works with local files and URLs. Simple shell wrapper.
DocPilot
智能文档处理专家,支持文档解析、信息抽取、文档分类
DocPilot
智能文档处理专家,支持文档解析、信息抽取、文档分类