ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

Pdf Ocr Layout

Skill by baokui

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/baokui/pdf-ocr-layout
Or

What This Skill Does

The Pdf Ocr Layout skill is a high-precision, multimodal document parsing engine designed to transform static documents into structured, machine-readable data. By leveraging a multi-stage architecture—integrating GLM-OCR for structural layout extraction, GLM-4.7 for logical textual reasoning, and GLM-4.6V for advanced visual analysis—the tool provides a deep semantic understanding of complex documents. It excels at extracting data tables into clean Markdown, isolating charts and illustrations into separate image files, and interpreting the underlying meaning of those visual and tabular elements within their original page context.

Installation

To install this skill, use the OpenClaw CLI tool from your terminal. Ensure you have the necessary environment permissions to download packages and access the source repository:

clawhub install openclaw/skills/skills/baokui/pdf-ocr-layout

Use Cases

  • Financial Reporting: Automatically extract and analyze tabular financial data from quarterly PDF reports, transforming raw digits into clean Markdown for spreadsheet import.
  • Technical Documentation: Convert dense engineering manuals into structured knowledge bases by separating diagrams and flowcharts from descriptive text.
  • Academic Research: Parse research papers to extract experimental charts, using multimodal analysis to summarize visual findings in natural language.
  • Compliance Auditing: Efficiently scan large batches of documents to locate and interpret specific table data or imagery required for regulatory compliance.

Example Prompts

  1. "Open the document at /data/financials.pdf, extract all the quarterly growth tables into Markdown, and analyze the trends shown in the charts on page 4."
  2. "Look at the technical report /data/specs.png, crop all the circuit diagram images, and provide a textual explanation of each diagram's function."
  3. "Please parse the document in /data/report.pdf and perform a logical analysis on the main data table, focusing specifically on the year-over-year cost variations."

Tips & Limitations

  • Pre-Processing: For best results with scanned physical documents, ensure image resolution is at least 300 DPI to allow GLM-OCR to identify elements accurately.
  • Large Files: For multi-hundred-page PDFs, consider splitting the file into smaller chunks, as processing every page may exceed temporary memory buffers.
  • Dependencies: This skill relies on external GLM models; ensure your API keys or cloud environment configurations are correctly set up to communicate with the Zhipu AI inference services.
  • Output Management: The output_dir parameter is mandatory. Ensure your environment has write access to the target directory to save the cropped assets.

Metadata

Author@baokui
Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-baokui-pdf-ocr-layout": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ocr#pdf-parsing#data-extraction#multimodal#document-analysis
Safety Score: 4/5

Flags: file-write, file-read, external-api