ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

marker-pdf-ocr

Convert PDF to Markdown using Marker OCR (local-first, cloud fallback)

Why use this skill?

Convert PDFs to accurate Markdown with the Marker OCR engine. Supports local-first private processing and cloud-based scaling for automated document pipelines.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/charpup/marker-pdf-ocr
Or

What This Skill Does

The marker-pdf-ocr skill is a powerful document conversion engine designed to transform complex PDF documents into clean, structured Markdown, JSON, or HTML. By utilizing the Marker OCR engine, the skill excels at interpreting diverse document types, including scientific papers, reports, and manuals, while preserving formatting, tables, and equations. The skill is architected for flexibility, offering both a local-first approach that keeps your sensitive documents private and a cloud-based fallback for resource-constrained environments.

Installation

To integrate this skill into your environment, you have several options depending on your hardware and security requirements. For the most secure, local-only processing, install via Python using: pip install marker-pdf torch. For streamlined usage within the OpenClaw ecosystem, simply execute openclaw skill install marker-pdf-ocr in your terminal. Ensure your environment meets the minimum memory requirements (4GB for local mode, 512MB for cloud) and consider configuring a swap file if you are working on resource-constrained hardware. For cloud-mode functionality, ensure your MARKER_API_KEY is correctly exported to your environment variables.

Use Cases

This skill is perfect for researchers, data scientists, and information architects who need to ingest large quantities of PDFs into Large Language Models (LLMs) or knowledge management systems. It is ideal for automating the ingestion of academic literature, digitizing legacy archives, or preparing business reports for structured data analysis. Because the tool supports batch processing, it can handle entire directories of documents, making it a critical asset for large-scale digitization projects or continuous data pipelines.

Example Prompts

  1. "OpenClaw, use marker-pdf-ocr to convert the research paper in my documents folder to Markdown for my Obsidian vault."
  2. "Please run a batch conversion on all PDF reports in the current directory and output the results in JSON format."
  3. "Run a health check on the marker-pdf tool and inform me if it is configured to use local or cloud processing."

Tips & Limitations

To optimize performance, always monitor your RAM usage during local conversions, as the underlying OCR models are memory-intensive. If you encounter out-of-memory (OOM) errors, increase your swap partition or switch to the cloud mode for memory-efficient processing. Note that while local mode provides high privacy, it requires significant CPU resources. Always verify your output format requirements; while Markdown is the default, switching to JSON can provide better metadata for downstream automated systems. If you are processing sensitive or confidential documents, strictly enforce the local deployment mode to ensure data residency remains on your hardware.

Metadata

Author@charpup
Stars1100
Views1
Updated2026-02-17
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-charpup-marker-pdf-ocr": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf#ocr#markdown#document-parsing#data-extraction
Safety Score: 4/5

Flags: file-read, file-write, network-access, external-api