What This Skill Does

The pdf-ocr-layout skill is a powerful, automated pipeline designed to transform scanned, non-searchable PDFs into high-fidelity, searchable digital documents while strictly maintaining the original visual layout. Unlike standard OCR tools that output raw text, this skill uses a sophisticated multi-stage process—Split, OCR API, Layout PDF, and Merge—to map extracted text back into exact bounding-box coordinates. By calibrating font sizes to match the source dimensions, it ensures that your resulting PDF looks as close to the original as possible, making it ideal for books, academic papers, and official scanned documents where spatial context is as important as the text content.

Installation

To integrate this skill into your OpenClaw environment, use the built-in package manager by executing the following command in your terminal:

clawhub install openclaw/skills/skills/biabia-55/pdf-ocr-layout-free

Ensure that you have Python 3.x installed on your system. Before the first run, the pipeline requires specific libraries. You can prepare your environment by running:

pip install pypdf reportlab Pillow requests

Use Cases

This skill is perfect for researchers, librarians, and administrative professionals. Common scenarios include converting archived scanned documents into searchable repositories, digitizing physical textbooks for enhanced accessibility, or extracting text from complex legal forms where column alignment must be preserved to maintain data integrity. Because the pipeline is stateful, it is particularly useful for handling massive PDF files that might otherwise time out or fail in less robust OCR solutions.

Example Prompts

"OCR this PDF at /documents/archive/scanned_book_01.pdf and make sure the text layout matches the original."
"扫描版PDF转文字：请帮我识别 /home/user/manual.pdf，要求保留排版格式。"
"把这个PDF做OCR，我需要一个可以搜索关键词的扫描版复印件，文件路径是 /data/report.pdf。"

Tips & Limitations

Resumability: If the script is interrupted, simply re-run the exact command. The work directory tracks job IDs and chunk results, allowing you to resume exactly where you left off.
Performance: API processing generally takes 1–5 minutes per 90-page chunk. Plan your time accordingly for very large documents.
Image Handling: While text is perfectly rendered, images are handled as placeholders or embedded if URLs are accessible; complex graphic rendering is not the primary focus.
Dependencies: Always verify that your PDF is not password-protected, as this will prevent the pipeline from accessing the file content.

pdf-ocr-layout

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)