marker-pdf-ocr
Convert PDF to Markdown using Marker OCR (local-first, cloud fallback)
Why use this skill?
Convert PDFs to accurate Markdown with the Marker OCR engine. Supports local-first private processing and cloud-based scaling for automated document pipelines.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/charpup/marker-pdf-ocrWhat This Skill Does
The marker-pdf-ocr skill is a powerful document conversion engine designed to transform complex PDF documents into clean, structured Markdown, JSON, or HTML. By utilizing the Marker OCR engine, the skill excels at interpreting diverse document types, including scientific papers, reports, and manuals, while preserving formatting, tables, and equations. The skill is architected for flexibility, offering both a local-first approach that keeps your sensitive documents private and a cloud-based fallback for resource-constrained environments.
Installation
To integrate this skill into your environment, you have several options depending on your hardware and security requirements. For the most secure, local-only processing, install via Python using: pip install marker-pdf torch. For streamlined usage within the OpenClaw ecosystem, simply execute openclaw skill install marker-pdf-ocr in your terminal. Ensure your environment meets the minimum memory requirements (4GB for local mode, 512MB for cloud) and consider configuring a swap file if you are working on resource-constrained hardware. For cloud-mode functionality, ensure your MARKER_API_KEY is correctly exported to your environment variables.
Use Cases
This skill is perfect for researchers, data scientists, and information architects who need to ingest large quantities of PDFs into Large Language Models (LLMs) or knowledge management systems. It is ideal for automating the ingestion of academic literature, digitizing legacy archives, or preparing business reports for structured data analysis. Because the tool supports batch processing, it can handle entire directories of documents, making it a critical asset for large-scale digitization projects or continuous data pipelines.
Example Prompts
- "OpenClaw, use marker-pdf-ocr to convert the research paper in my documents folder to Markdown for my Obsidian vault."
- "Please run a batch conversion on all PDF reports in the current directory and output the results in JSON format."
- "Run a health check on the marker-pdf tool and inform me if it is configured to use local or cloud processing."
Tips & Limitations
To optimize performance, always monitor your RAM usage during local conversions, as the underlying OCR models are memory-intensive. If you encounter out-of-memory (OOM) errors, increase your swap partition or switch to the cloud mode for memory-efficient processing. Note that while local mode provides high privacy, it requires significant CPU resources. Always verify your output format requirements; while Markdown is the default, switching to JSON can provide better metadata for downstream automated systems. If you are processing sensitive or confidential documents, strictly enforce the local deployment mode to ensure data residency remains on your hardware.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-charpup-marker-pdf-ocr": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, network-access, external-api
Related Skills
task-workflow-v3
智能任务调度系统 V3 - 支持文件持久化、进度追踪、自动归档
Notion Md Converter
Skill by charpup
galatea-memory
Galatea 记忆管理增强系统 - 实现分层记忆、自动检查点和关键信息标记
task-workflow
Standardized Planning + Subagent + Progress Report workflow for complex tasks
Openclaw Config Validator
Skill by charpup