ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified file management Safety 4/5

mineru-pdf

Parse PDFs locally (CPU) into Markdown/JSON using MinerU. Assumes MinerU creates per‑doc output folders; supports table/image extraction.

Why use this skill?

Convert PDFs to Markdown or JSON locally with MinerU. High-fidelity document parsing for text, tables, and images without cloud dependencies.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/kesslerio/mineru-pdf-parser-clawdbot-skill
Or

What This Skill Does

The mineru-pdf skill provides a robust, local-first solution for converting complex PDF documents into structured machine-readable formats. Leveraging the powerful MinerU engine, this skill runs entirely on your CPU, ensuring data privacy and offline capability. It excels at complex document layout analysis, allowing it to extract text, tables, and images with high fidelity. Whether you are dealing with academic papers, technical documentation, or financial reports, this skill transforms static PDF content into Markdown or JSON, making it ready for downstream LLM analysis or database ingestion.

Installation

To integrate this skill into your environment, use the OpenClaw installation command:

clawhub install openclaw/skills/skills/kesslerio/mineru-pdf-parser-clawdbot-skill

Ensure that you have the necessary system dependencies required by MinerU installed on your machine. Once installed, the skill exposes the mineru_parse.sh script, which serves as the primary interface for your PDF processing tasks.

Use Cases

  • Research Extraction: Parse dense academic PDFs into Markdown to summarize findings or populate knowledge bases.
  • Data Digitization: Convert tabular data trapped in PDF reports into structured JSON for automated analysis or spreadsheet ingestion.
  • Content Migration: Transform legacy PDF manuals into modern Markdown documents while preserving structural hierarchy.
  • Local Data Privacy: Process sensitive documentation entirely on-device without uploading files to third-party cloud parsing services.

Example Prompts

  1. "Please parse the document located at /data/research/2023_report.pdf and output the result in Markdown format."
  2. "Extract the tables and images from the file /docs/manual.pdf using the mineru-pdf skill and save the output to the default directory."
  3. "Convert /finance/q4_statement.pdf into a structured JSON file so I can programmatically analyze the financial data."

Tips & Limitations

  • Performance: Because this skill performs heavy computation locally on the CPU, large documents or high-resolution images may increase processing time. Ensure your environment has sufficient available RAM.
  • Output Management: MinerU follows a strict directory structure. Note that all output is generated under ./mineru-output/<basename>/ to prevent directory clutter. Always verify the file paths in your logs after execution.
  • Reference Documentation: For complex configurations, such as tuning backend methods or multi-threading, consult the references/mineru-cli.md file included in the skill repository.
  • Batch Processing: This skill is optimized for single-file processing. If you have large archives, consider writing a simple shell loop to iterate through your directory rather than forcing batch flags.

Metadata

Author@kesslerio
Stars1776
Views6
Updated2026-03-02
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-kesslerio-mineru-pdf-parser-clawdbot-skill": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf-parsing#markdown#ocr#local-first#document-analysis
Safety Score: 4/5

Flags: file-write, file-read, code-execution