ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

mineru-pdf

Parse PDF documents with MinerU MCP to extract text, tables, and formulas. Supports multiple backends including MLX-accelerated inference on Apple Silicon.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/etoile04/mineru-pdf
Or

MinerU PDF Parser

Parse PDF documents using MinerU MCP to extract structured content including text, tables, and formulas with MLX acceleration on Apple Silicon.

Installation

Option 1: Install MinerU MCP (for Claude Code)

claude mcp add --transport stdio --scope user mineru -- \
  uvx --from mcp-mineru python -m mcp_mineru.server

This installs and configures MinerU for all Claude projects. Models are downloaded on first use.

Option 2: Use Direct Tool (preserves files)

The skill includes a direct parsing tool that saves output to a persistent directory:

python /Users/lwj04/clawd/skills/mineru-pdf/parse.py <pdf_path> <output_dir> [options]

Advantages:

  • ✅ Files are saved permanently (not auto-deleted)
  • ✅ Full control over output location
  • ✅ No MCP overhead
  • ✅ Works with any Python environment that has MinerU

Quick Start

Method 1: Using the Direct Tool (Recommended)

# Parse entire PDF
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output"

# Parse specific pages
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output" \
  --start-page 0 --end-page 2

# Use Apple Silicon optimization
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output" \
  --backend vlm-mlx-engine

# Text only (faster)
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output" \
  --no-table --no-formula

Method 2: Using MinerU MCP (Temporary Files)

Parse a PDF document

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def parse_pdf():
    result = await call_tool(
        name='parse_pdf',
        arguments={
            'file_path': '/path/to/document.pdf',
            'backend': 'pipeline',
            'formula_enable': True,
            'table_enable': True,
            'start_page': 0,
            'end_page': -1  # -1 for all pages
        }
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(parse_pdf())
"

Check system capabilities

uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def list_backends():
    result = await call_tool(
        name='list_backends',
        arguments={}
    )
    if hasattr(result, 'content'):
        for item in result.content:
            if hasattr(item, 'text'):
                print(item.text)
                break

asyncio.run(list_backends())
"

Parameters

parse_pdf

Required:

  • file_path - Absolute path to the PDF file

Metadata

Author@etoile04
Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-etoile04-mineru-pdf": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.