Back to Registry View Author Profile
Official Verified
mineru-pdf
Parse PDF documents with MinerU MCP to extract text, tables, and formulas. Supports multiple backends including MLX-accelerated inference on Apple Silicon.
skill-install — Terminal
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/etoile04/mineru-pdfOr
MinerU PDF Parser
Parse PDF documents using MinerU MCP to extract structured content including text, tables, and formulas with MLX acceleration on Apple Silicon.
Installation
Option 1: Install MinerU MCP (for Claude Code)
claude mcp add --transport stdio --scope user mineru -- \
uvx --from mcp-mineru python -m mcp_mineru.server
This installs and configures MinerU for all Claude projects. Models are downloaded on first use.
Option 2: Use Direct Tool (preserves files)
The skill includes a direct parsing tool that saves output to a persistent directory:
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py <pdf_path> <output_dir> [options]
Advantages:
- ✅ Files are saved permanently (not auto-deleted)
- ✅ Full control over output location
- ✅ No MCP overhead
- ✅ Works with any Python environment that has MinerU
Quick Start
Method 1: Using the Direct Tool (Recommended)
# Parse entire PDF
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
"/path/to/document.pdf" \
"/path/to/output"
# Parse specific pages
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
"/path/to/document.pdf" \
"/path/to/output" \
--start-page 0 --end-page 2
# Use Apple Silicon optimization
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
"/path/to/document.pdf" \
"/path/to/output" \
--backend vlm-mlx-engine
# Text only (faster)
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
"/path/to/document.pdf" \
"/path/to/output" \
--no-table --no-formula
Method 2: Using MinerU MCP (Temporary Files)
Parse a PDF document
uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool
async def parse_pdf():
result = await call_tool(
name='parse_pdf',
arguments={
'file_path': '/path/to/document.pdf',
'backend': 'pipeline',
'formula_enable': True,
'table_enable': True,
'start_page': 0,
'end_page': -1 # -1 for all pages
}
)
if hasattr(result, 'content'):
for item in result.content:
if hasattr(item, 'text'):
print(item.text)
break
asyncio.run(parse_pdf())
"
Check system capabilities
uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool
async def list_backends():
result = await call_tool(
name='list_backends',
arguments={}
)
if hasattr(result, 'content'):
for item in result.content:
if hasattr(item, 'text'):
print(item.text)
break
asyncio.run(list_backends())
"
Parameters
parse_pdf
Required:
file_path- Absolute path to the PDF file
Metadata
AI Skill Finder
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skill Add to Configuration
Paste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-etoile04-mineru-pdf": {
"enabled": true,
"auto_update": true
}
}
}Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.