What This Skill Does

The mineru-pdf-extractor is a specialized tool integrated into the OpenClaw agent ecosystem, designed to bridge the gap between complex PDF document structures and machine-readable Markdown. Unlike basic text scrapers, this skill leverages the MinerU API to provide advanced document processing capabilities, including high-fidelity optical character recognition (OCR), structural table extraction, and complex scientific formula parsing. It effectively transforms static document layouts into clean, structured Markdown, making the information suitable for RAG systems, model training, or document management workflows.

Installation

To begin using this skill, ensure you have the OpenClaw environment configured. Install the skill via the command line: clawhub install openclaw/skills/skills/a-i-r/mineru-pdf-extractor Once installed, you must register at https://mineru.net/ to obtain an API Key. Secure your key by setting it as an environment variable in your terminal session: export MINERU_TOKEN="your_api_token_here". Ensure you have curl and unzip installed on your system, as these are required for handling the file transfers and extraction processes.

Use Cases

This skill is perfect for researchers, developers, and data analysts who need to convert dense, content-heavy PDF files into interoperable formats. Common use cases include:

Converting academic research papers with complex LaTeX formulas into Markdown for Obsidian or knowledge bases.
Extracting complex financial data tables from PDF reports into clean Markdown tables for spreadsheet integration.
Digitizing historical scanned documents using integrated OCR technology.
Preparing datasets for Large Language Model (LLM) fine-tuning or vector database ingestion.

Example Prompts

"Extract the local file './research/paper_001.pdf' using the MinerU extractor and save the results to the 'output' directory."
"I need to parse an online technical manual located at https://example.com/docs/manual.pdf. Please run the MinerU online parsing task and download the resulting Markdown."
"MinerU, convert this PDF file into Markdown, ensuring all tables are preserved in standard format and formulas are correctly interpreted."

Tips & Limitations

Large Files: For extremely long PDFs, note that processing time may vary; always check the status via the poll script.
Network Dependency: As this is an API-based service, ensure your system has stable internet access to reach the MinerU API endpoints.
Privacy: Be aware that your documents are uploaded to the MinerU service for processing; ensure you have appropriate data privacy clearances for sensitive documents.
Error Handling: If the process fails, verify your MINERU_TOKEN is active and that your local network allows outbound HTTPS traffic.

mineru-pdf-extractor

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)