Official Verified file management Safety 4/5

pdf-process-mineru

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/baokui/pdf-parser-mineru

Download Source Code (.zip)

What This Skill Does

The pdf-process-mineru skill is a powerful document conversion engine designed for OpenClaw agents. Leveraging the robust MinerU framework, it enables agents to transform complex PDF files into machine-readable formats like Markdown and structured JSON. Unlike simple text extractors, this tool excels at preserving the semantic integrity of documents by identifying and formatting formulas, tables, and image placements. Whether you are dealing with academic papers, technical documentation, or financial reports, this skill provides a high-fidelity pipeline for turning static visual documents into searchable, editable, and data-rich content.

Installation

To integrate this skill into your environment, run the following command in your OpenClaw terminal: clawhub install openclaw/skills/skills/baokui/pdf-parser-mineru

Ensure that your environment meets the dependencies required by MinerU, including necessary OCR engines, to take full advantage of the advanced layout analysis features.

Use Cases

Research & Academic Analysis: Quickly convert dense research papers into Markdown to feed into LLMs for summarization, synthesis, or querying specific formulas and data points.
Financial Reporting: Extract complex tables from annual PDF reports into structured JSON format for automated spreadsheet ingestion or database population.
Knowledge Base Automation: Batch process legacy PDF manuals into clean Markdown files to update enterprise documentation portals with minimal manual formatting effort.
Data Extraction: Programmatically pull data from scanned documents by leveraging the hybrid OCR engines that MinerU provides.

Example Prompts

"Please convert the PDF located at /docs/annual_report.pdf to Markdown and save the output to /reports/analysis/, ensuring you capture all the tables."
"Can you process pages 10 through 20 of /research/physics_paper.pdf using the pipeline backend and give me the JSON representation of the layout?"
"Convert /manuals/system_guide.pdf into Markdown format, make sure you enable both formula and table recognition so the output is accurate."

Tips & Limitations

Backend Selection: The hybrid-auto-engine is suitable for most general tasks. If you are processing documents with very complex layouts, try the vlm-auto-engine for superior visual comprehension.
Performance: High-resolution OCR and layout analysis are computationally expensive. Processing very long PDFs can take significant time; consider using the start_page and end_page parameters for granular tasks.
Output Integrity: While MinerU is highly accurate, verify complex nested tables after conversion, as highly non-standard document formatting can occasionally lead to structural artifacts.

Read Full Documentation on GitHub

Metadata

Author@baokui

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-baokui-pdf-parser-mineru": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf#ocr#parsing#markdown#data-extraction

Safety Score: 4/5

Flags: file-read, file-write

Related Skills

Pdf Ocr Layout

Skill by baokui

baokui 4473

llm-video-generator

Generate videos from text descriptions using ZhipuAI CogVideoX-3 model. Supports text-to-video, image-to-video, and first/last frame-to-video generation. Automatically handles long videos (over 5s) by chaining multiple generation calls with last-frame continuation. Use when the user asks to create/generate a video from text, make a video, text-to-video, 文生视频, 生成视频, 做个视频, or any request involving converting text/images into a video. Supports configuring video content, style, resolution (up to 4K), frame rate (30/60fps), audio, and duration.

baokui 4473

wan-t2i

阿里云DashScope Wan2.6文生图工具。使用阿里云百炼平台的Wan2.6-t2i模型生成图片。当用户需要：AI生成图片、文生图、从文字生成图像时触发。需要DASHSCOPE_API_KEY环境变量（已在系统中配置）。

baokui 4473

glm-v-model

智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。当用户提到：图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。

baokui 4473