paddleocr-text-recognition
Use this skill whenever the user wants text extracted from images, photos, scans, screenshots, or scanned PDFs. Returns exact machine-readable strings with line-level text and optional bbox coordinates. Strong accuracy for CJK, small print, and handwritten text. Trigger terms: OCR, 文字识别, 图片转文字, 截图识字, 提取图中文字, 扫描识字, 识字, 纯文字, plain text extraction, 坐标, 检测框, bbox, bounding box, image to text, screenshot, photo scan, recognize text.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bobholamovic/paddleocr-text-recognitionWhat This Skill Does
The paddleocr-text-recognition skill serves as a high-performance interface for optical character recognition (OCR) within the OpenClaw environment. It leverages the robust PaddleOCR engine to extract printed or handwritten text, along with coordinate-based spatial information, from images and PDF documents. This skill is designed to act as an automated vision bridge, enabling the agent to interpret complex visual inputs such as invoices, screenshots, or scanned documents without requiring native image interpretation logic. The tool manages the execution flow via a dedicated Python wrapper, ensuring that data is processed, parsed, and presented back to the agent in a structured JSON format.
Installation
To integrate this skill into your OpenClaw agent, execute the following command in your terminal:
clawhub install openclaw/skills/skills/bobholamovic/paddleocr-text-recognition
Once the repository is synced, navigate to the skill directory (skills/paddleocr-text-recognition) and prepare your environment by running the dependency installer:
pip install -r scripts/requirements.txt
Use Cases
This skill is indispensable for workflows requiring data extraction from non-textual files. Ideal scenarios include:
- Converting image-based receipts or invoices into structured machine-readable text for expense reporting.
- Digitizing historical scans or images of documents that lack underlying text layers.
- Extracting specific fields, tables, or text blocks from structured forms located in images or PDF files.
- Automating the ingestion of data from URLs pointing to visual content, such as web-based product specifications or diagrams.
Example Prompts
- "Please extract all the text from this invoice image and save it as a JSON object so I can process the total amount."
- "I've uploaded a screenshot of a technical error log. Use PaddleOCR to read the text and highlight any lines containing 'Critical Failure'."
- "Go to this URL: https://example.com/receipt.pdf, run OCR on the document, and list all line items found in the table."
Tips & Limitations
- Direct API Execution: Always utilize the
scripts/ocr_caller.pyscript provided. Do not attempt to process images using alternate methods. - Error Handling: If the API returns an error, display it clearly to the user. Do not attempt to use built-in vision capabilities as a fallback, as this violates strict operational protocols.
- File Management: By default, results are saved in the system temp directory. Utilize the
--stdoutflag if you prefer immediate parsing without persistence, or define a specific path with--outputfor better file management. - Input Precision: Ensure the file path or URL provided is accessible and correctly formatted to avoid API timeouts or connection failures.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bobholamovic-paddleocr-text-recognition": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution