What This Skill Does

The paddleocr-doc-parsing skill provides an advanced interface for the PaddleOCR engine, enabling the OpenClaw AI agent to convert visual document formats—including images and PDFs—into machine-readable, structured text. By integrating directly with PaddleOCR’s powerful computer vision and text recognition models, this tool excels at extracting content from complex layouts, identifying tables, detecting formulas, and maintaining the structural integrity of the original document. It supports both synchronous execution for immediate, low-latency text extraction and asynchronous workflows for processing large, multi-page documents that require longer computation times. With support for over 110 languages, it serves as a highly versatile bridge between physical documents or static digital files and digital data processing workflows.

Installation

To integrate this skill into your environment, use the OpenClaw package manager: clawhub install openclaw/skills/skills/hiotec/paddleocr-doc-parsing-v2

After installation, ensure you have set your environment variables for authentication:

PADDLEOCR_ACCESS_TOKEN: Required for API access.
PADDLEOCR_API_URL: The endpoint for synchronous layout parsing.
PADDLEOCR_JOB_URL: Required for managing async jobs.

Use Cases

Automated Data Entry: Automatically extract data from scanned invoices, receipts, and bank statements to populate accounting software.
Content Digitization: Convert archival PDF scans or historical images into searchable text formats for research and database indexing.
Accessibility Tools: Convert visual text blocks from images into clean, structured Markdown for text-to-speech accessibility features.
Technical Document Extraction: Use the built-in layout analysis to parse complex technical manuals that contain mixed content like diagrams, tables, and mathematical formulas.

Example Prompts

"Use paddleocr-doc-parsing to scan this invoice image and convert the table data into a JSON file."
"Please parse the attached large PDF document using the async mode to ensure all pages are processed correctly."
"Extract the text from this screenshot and save the output as a clean Markdown file in my documents folder."

Tips & Limitations

Timeout Considerations: Sync mode is ideal for single-page files; always use the async flag for multi-page documents or PDFs exceeding 5MB to avoid timeout errors.
Precision: Ensure the input image resolution is at least 300 DPI for optimal text recognition accuracy, especially for documents with handwritten notes.
Security: Be cautious when parsing sensitive documents; ensure that the PADDLEOCR_API_URL you point to is a trusted, private infrastructure rather than a public endpoint when handling confidential data.
Formatting: The tool outputs Markdown by default, which is highly effective for document structure but may require post-processing if you need strict CSV or Excel formatting for complex table structures.

paddleocr-doc-parsing

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)