pdf-process-mineru
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/baokui/pdf-parser-mineruWhat This Skill Does
The pdf-process-mineru skill is a powerful document conversion engine designed for OpenClaw agents. Leveraging the robust MinerU framework, it enables agents to transform complex PDF files into machine-readable formats like Markdown and structured JSON. Unlike simple text extractors, this tool excels at preserving the semantic integrity of documents by identifying and formatting formulas, tables, and image placements. Whether you are dealing with academic papers, technical documentation, or financial reports, this skill provides a high-fidelity pipeline for turning static visual documents into searchable, editable, and data-rich content.
Installation
To integrate this skill into your environment, run the following command in your OpenClaw terminal:
clawhub install openclaw/skills/skills/baokui/pdf-parser-mineru
Ensure that your environment meets the dependencies required by MinerU, including necessary OCR engines, to take full advantage of the advanced layout analysis features.
Use Cases
- Research & Academic Analysis: Quickly convert dense research papers into Markdown to feed into LLMs for summarization, synthesis, or querying specific formulas and data points.
- Financial Reporting: Extract complex tables from annual PDF reports into structured JSON format for automated spreadsheet ingestion or database population.
- Knowledge Base Automation: Batch process legacy PDF manuals into clean Markdown files to update enterprise documentation portals with minimal manual formatting effort.
- Data Extraction: Programmatically pull data from scanned documents by leveraging the hybrid OCR engines that MinerU provides.
Example Prompts
- "Please convert the PDF located at /docs/annual_report.pdf to Markdown and save the output to /reports/analysis/, ensuring you capture all the tables."
- "Can you process pages 10 through 20 of /research/physics_paper.pdf using the pipeline backend and give me the JSON representation of the layout?"
- "Convert /manuals/system_guide.pdf into Markdown format, make sure you enable both formula and table recognition so the output is accurate."
Tips & Limitations
- Backend Selection: The
hybrid-auto-engineis suitable for most general tasks. If you are processing documents with very complex layouts, try thevlm-auto-enginefor superior visual comprehension. - Performance: High-resolution OCR and layout analysis are computationally expensive. Processing very long PDFs can take significant time; consider using the
start_pageandend_pageparameters for granular tasks. - Output Integrity: While MinerU is highly accurate, verify complex nested tables after conversion, as highly non-standard document formatting can occasionally lead to structural artifacts.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-baokui-pdf-parser-mineru": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write
Related Skills
Pdf Ocr Layout
Skill by baokui
llm-video-generator
Generate videos from text descriptions using ZhipuAI CogVideoX-3 model. Supports text-to-video, image-to-video, and first/last frame-to-video generation. Automatically handles long videos (over 5s) by chaining multiple generation calls with last-frame continuation. Use when the user asks to create/generate a video from text, make a video, text-to-video, 文生视频, 生成视频, 做个视频, or any request involving converting text/images into a video. Supports configuring video content, style, resolution (up to 4K), frame rate (30/60fps), audio, and duration.
wan-t2i
阿里云DashScope Wan2.6文生图工具。使用阿里云百炼平台的Wan2.6-t2i模型生成图片。 当用户需要:AI生成图片、文生图、从文字生成图像时触发。 需要DASHSCOPE_API_KEY环境变量(已在系统中配置)。
glm-v-model
智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到:图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。