ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified productivity Safety 4/5

pdf-ocr-layout

Full OCR pipeline for scanned PDFs with layout preservation. Use this skill whenever the user wants to OCR a PDF, convert a scanned document to searchable text, or preserve the original layout of a scanned book/document. Triggers on: "OCR this PDF", "用PaddleOCR处理", "识别这个PDF", "扫描版PDF转文字", "把这个PDF做OCR", or when a PDF path is provided alongside any mention of OCR, text recognition, or layout preservation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/biabia-55/pdf-ocr-layout-free
Or

What This Skill Does

The pdf-ocr-layout skill is a powerful, automated pipeline designed to transform scanned, non-searchable PDFs into high-fidelity, searchable digital documents while strictly maintaining the original visual layout. Unlike standard OCR tools that output raw text, this skill uses a sophisticated multi-stage process—Split, OCR API, Layout PDF, and Merge—to map extracted text back into exact bounding-box coordinates. By calibrating font sizes to match the source dimensions, it ensures that your resulting PDF looks as close to the original as possible, making it ideal for books, academic papers, and official scanned documents where spatial context is as important as the text content.

Installation

To integrate this skill into your OpenClaw environment, use the built-in package manager by executing the following command in your terminal:

clawhub install openclaw/skills/skills/biabia-55/pdf-ocr-layout-free

Ensure that you have Python 3.x installed on your system. Before the first run, the pipeline requires specific libraries. You can prepare your environment by running:

pip install pypdf reportlab Pillow requests

Use Cases

This skill is perfect for researchers, librarians, and administrative professionals. Common scenarios include converting archived scanned documents into searchable repositories, digitizing physical textbooks for enhanced accessibility, or extracting text from complex legal forms where column alignment must be preserved to maintain data integrity. Because the pipeline is stateful, it is particularly useful for handling massive PDF files that might otherwise time out or fail in less robust OCR solutions.

Example Prompts

  1. "OCR this PDF at /documents/archive/scanned_book_01.pdf and make sure the text layout matches the original."
  2. "扫描版PDF转文字:请帮我识别 /home/user/manual.pdf,要求保留排版格式。"
  3. "把这个PDF做OCR,我需要一个可以搜索关键词的扫描版复印件,文件路径是 /data/report.pdf。"

Tips & Limitations

  • Resumability: If the script is interrupted, simply re-run the exact command. The work directory tracks job IDs and chunk results, allowing you to resume exactly where you left off.
  • Performance: API processing generally takes 1–5 minutes per 90-page chunk. Plan your time accordingly for very large documents.
  • Image Handling: While text is perfectly rendered, images are handled as placeholders or embedded if URLs are accessible; complex graphic rendering is not the primary focus.
  • Dependencies: Always verify that your PDF is not password-protected, as this will prevent the pipeline from accessing the file content.

Metadata

Author@biabia-55
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-biabia-55-pdf-ocr-layout-free": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ocr#pdf#productivity#document-processing#digitization
Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution