mineru-pdf-extractor
Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/a-i-r/mineru-pdf-extractorWhat This Skill Does
The mineru-pdf-extractor is a specialized tool integrated into the OpenClaw agent ecosystem, designed to bridge the gap between complex PDF document structures and machine-readable Markdown. Unlike basic text scrapers, this skill leverages the MinerU API to provide advanced document processing capabilities, including high-fidelity optical character recognition (OCR), structural table extraction, and complex scientific formula parsing. It effectively transforms static document layouts into clean, structured Markdown, making the information suitable for RAG systems, model training, or document management workflows.
Installation
To begin using this skill, ensure you have the OpenClaw environment configured. Install the skill via the command line:
clawhub install openclaw/skills/skills/a-i-r/mineru-pdf-extractor
Once installed, you must register at https://mineru.net/ to obtain an API Key. Secure your key by setting it as an environment variable in your terminal session: export MINERU_TOKEN="your_api_token_here". Ensure you have curl and unzip installed on your system, as these are required for handling the file transfers and extraction processes.
Use Cases
This skill is perfect for researchers, developers, and data analysts who need to convert dense, content-heavy PDF files into interoperable formats. Common use cases include:
- Converting academic research papers with complex LaTeX formulas into Markdown for Obsidian or knowledge bases.
- Extracting complex financial data tables from PDF reports into clean Markdown tables for spreadsheet integration.
- Digitizing historical scanned documents using integrated OCR technology.
- Preparing datasets for Large Language Model (LLM) fine-tuning or vector database ingestion.
Example Prompts
- "Extract the local file './research/paper_001.pdf' using the MinerU extractor and save the results to the 'output' directory."
- "I need to parse an online technical manual located at https://example.com/docs/manual.pdf. Please run the MinerU online parsing task and download the resulting Markdown."
- "MinerU, convert this PDF file into Markdown, ensuring all tables are preserved in standard format and formulas are correctly interpreted."
Tips & Limitations
- Large Files: For extremely long PDFs, note that processing time may vary; always check the status via the poll script.
- Network Dependency: As this is an API-based service, ensure your system has stable internet access to reach the MinerU API endpoints.
- Privacy: Be aware that your documents are uploaded to the MinerU service for processing; ensure you have appropriate data privacy clearances for sensitive documents.
- Error Handling: If the process fails, verify your
MINERU_TOKENis active and that your local network allows outbound HTTPS traffic.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-a-i-r-mineru-pdf-extractor": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, external-api, network-access