ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified file management Safety 5/5

pymupdf-pdf

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

Why use this skill?

Efficiently parse PDFs locally with the OpenClaw PyMuPDF skill. Get high-speed Markdown or JSON output for quick document ingestion and automation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/kesslerio/pymupdf-pdf-parser-clawdbot-skill
Or

What This Skill Does

The pymupdf-pdf skill provides a high-performance, local PDF extraction interface for the OpenClaw agent. Utilizing the PyMuPDF (fitz) library, it enables the rapid conversion of PDF documents into clean Markdown or structured JSON formats. Unlike heavy-duty OCR tools, this skill is optimized for speed and efficiency, making it the ideal choice for processing large volumes of standard documents or for quick-look document ingestion when latency is a primary concern. It supports granular features like optional image extraction and basic table parsing, outputting files into organized, per-document directories to maintain workspace hygiene.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/kesslerio/pymupdf-pdf-parser-clawdbot-skill

Ensure that your system environment is configured with the necessary dependencies. If you encounter errors related to missing libraries or Nix-specific linking issues, please consult the documentation provided in references/pymupdf-notes.md within the skill source repository.

Use Cases

This skill is best deployed in scenarios where document processing speed is prioritized over complex layout reconstruction. Use cases include:

  • Rapidly scanning internal documentation or manuals to populate an agent's knowledge base.
  • Bulk-processing PDF archives where high-fidelity OCR is unnecessary.
  • Serving as a secondary, fallback parser when heavier OCR systems (like MinerU) are currently unavailable or failing on simple documents.
  • Extracting textual content from clean, machine-generated PDFs to facilitate downstream RAG (Retrieval-Augmented Generation) workflows.

Example Prompts

  1. "Use the pymupdf-pdf skill to parse the quarterly_report.pdf file and output the results in Markdown format to my ./parsed-docs directory."
  2. "Extract all text and images from contract_final.pdf using the pymupdf-pdf parser."
  3. "Run a quick parse on the technical-manual.pdf, generate both Markdown and JSON outputs, and include table extraction to help me identify the data structure."

Tips & Limitations

  • Speed vs. Robustness: PyMuPDF is extremely fast but lacks the complex multi-column layout analysis of more advanced OCR tools. For documents with complex headers, multi-column layouts, or heavy formatting, expect occasional structural errors.
  • Image/Table Quality: Image extraction is straightforward, but table parsing is limited to line-based estimation; do not rely on it for complex financial spreadsheets.
  • Organization: By default, the skill creates organized subdirectories. Avoid changing the --outroot to a root directory unless you have a specific reason to bypass this automatic cleanup.
  • Troubleshooting: If the parser struggles with a specific file, check the PDF for security permissions or non-standard encoding, then consider falling back to a more intensive OCR tool if required.

Metadata

Author@kesslerio
Stars1776
Views0
Updated2026-03-02
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-kesslerio-pymupdf-pdf-parser-clawdbot-skill": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf#parsing#extraction#automation#local-tools
Safety Score: 5/5

Flags: file-write, file-read