paddleocr-doc-parsing
Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.
Why use this skill?
Efficiently convert images and PDFs to text with OpenClaw's PaddleOCR skill. Supports multi-language, tables, and async processing.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/hiotec/paddleocr-doc-parsing-v2What This Skill Does
The paddleocr-doc-parsing skill provides an advanced interface for the PaddleOCR engine, enabling the OpenClaw AI agent to convert visual document formats—including images and PDFs—into machine-readable, structured text. By integrating directly with PaddleOCR’s powerful computer vision and text recognition models, this tool excels at extracting content from complex layouts, identifying tables, detecting formulas, and maintaining the structural integrity of the original document. It supports both synchronous execution for immediate, low-latency text extraction and asynchronous workflows for processing large, multi-page documents that require longer computation times. With support for over 110 languages, it serves as a highly versatile bridge between physical documents or static digital files and digital data processing workflows.
Installation
To integrate this skill into your environment, use the OpenClaw package manager:
clawhub install openclaw/skills/skills/hiotec/paddleocr-doc-parsing-v2
After installation, ensure you have set your environment variables for authentication:
PADDLEOCR_ACCESS_TOKEN: Required for API access.PADDLEOCR_API_URL: The endpoint for synchronous layout parsing.PADDLEOCR_JOB_URL: Required for managing async jobs.
Use Cases
- Automated Data Entry: Automatically extract data from scanned invoices, receipts, and bank statements to populate accounting software.
- Content Digitization: Convert archival PDF scans or historical images into searchable text formats for research and database indexing.
- Accessibility Tools: Convert visual text blocks from images into clean, structured Markdown for text-to-speech accessibility features.
- Technical Document Extraction: Use the built-in layout analysis to parse complex technical manuals that contain mixed content like diagrams, tables, and mathematical formulas.
Example Prompts
- "Use paddleocr-doc-parsing to scan this invoice image and convert the table data into a JSON file."
- "Please parse the attached large PDF document using the async mode to ensure all pages are processed correctly."
- "Extract the text from this screenshot and save the output as a clean Markdown file in my documents folder."
Tips & Limitations
- Timeout Considerations: Sync mode is ideal for single-page files; always use the async flag for multi-page documents or PDFs exceeding 5MB to avoid timeout errors.
- Precision: Ensure the input image resolution is at least 300 DPI for optimal text recognition accuracy, especially for documents with handwritten notes.
- Security: Be cautious when parsing sensitive documents; ensure that the
PADDLEOCR_API_URLyou point to is a trusted, private infrastructure rather than a public endpoint when handling confidential data. - Formatting: The tool outputs Markdown by default, which is highly effective for document structure but may require post-processing if you need strict CSV or Excel formatting for complex table structures.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-hiotec-paddleocr-doc-parsing-v2": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, external-api