ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified file management Safety 4/5

mineru-pdf-extractor

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/a-i-r/mineru-pdf-extractor
Or

What This Skill Does

The mineru-pdf-extractor is a specialized tool integrated into the OpenClaw agent ecosystem, designed to bridge the gap between complex PDF document structures and machine-readable Markdown. Unlike basic text scrapers, this skill leverages the MinerU API to provide advanced document processing capabilities, including high-fidelity optical character recognition (OCR), structural table extraction, and complex scientific formula parsing. It effectively transforms static document layouts into clean, structured Markdown, making the information suitable for RAG systems, model training, or document management workflows.

Installation

To begin using this skill, ensure you have the OpenClaw environment configured. Install the skill via the command line: clawhub install openclaw/skills/skills/a-i-r/mineru-pdf-extractor Once installed, you must register at https://mineru.net/ to obtain an API Key. Secure your key by setting it as an environment variable in your terminal session: export MINERU_TOKEN="your_api_token_here". Ensure you have curl and unzip installed on your system, as these are required for handling the file transfers and extraction processes.

Use Cases

This skill is perfect for researchers, developers, and data analysts who need to convert dense, content-heavy PDF files into interoperable formats. Common use cases include:

  • Converting academic research papers with complex LaTeX formulas into Markdown for Obsidian or knowledge bases.
  • Extracting complex financial data tables from PDF reports into clean Markdown tables for spreadsheet integration.
  • Digitizing historical scanned documents using integrated OCR technology.
  • Preparing datasets for Large Language Model (LLM) fine-tuning or vector database ingestion.

Example Prompts

  1. "Extract the local file './research/paper_001.pdf' using the MinerU extractor and save the results to the 'output' directory."
  2. "I need to parse an online technical manual located at https://example.com/docs/manual.pdf. Please run the MinerU online parsing task and download the resulting Markdown."
  3. "MinerU, convert this PDF file into Markdown, ensuring all tables are preserved in standard format and formulas are correctly interpreted."

Tips & Limitations

  • Large Files: For extremely long PDFs, note that processing time may vary; always check the status via the poll script.
  • Network Dependency: As this is an API-based service, ensure your system has stable internet access to reach the MinerU API endpoints.
  • Privacy: Be aware that your documents are uploaded to the MinerU service for processing; ensure you have appropriate data privacy clearances for sensitive documents.
  • Error Handling: If the process fails, verify your MINERU_TOKEN is active and that your local network allows outbound HTTPS traffic.

Metadata

Author@a-i-r
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-a-i-r-mineru-pdf-extractor": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf#markdown#ocr#parsing#automation
Safety Score: 4/5

Flags: file-read, file-write, external-api, network-access