docling
Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU acceleration. Use INSTEAD of web_fetch for extracting content from specific URLs when you need clean, structured text. Use Brave (web_search) for searching/discovering pages. Use docling when you HAVE a URL and need its content parsed.
Why use this skill?
Use Docling to transform PDFs, DOCX, and web pages into structured text. Powerful OCR with GPU acceleration for your OpenClaw agent.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/er3mit4/doclingWhat This Skill Does
Docling is a powerful CLI utility designed to transform complex, multi-format documents and web pages into clean, machine-readable structured text. At its core, Docling leverages sophisticated machine learning models and high-performance OCR capabilities to convert PDFs, DOCX, PPTX, and HTML into formats like Markdown, text, JSON, or YAML. Unlike basic web scrapers, Docling is purpose-built for enterprise-grade document understanding, utilizing GPU acceleration (CUDA) to process scanned documents, complex layouts, and multi-page reports with exceptional speed and accuracy.
Installation
To integrate Docling into your OpenClaw agent, ensure you have the CLI installed globally on your system. The recommended installation method is via pipx to maintain environment isolation:
pipx install docling
After installation, verify that the utility is accessible in your system PATH. For users requiring OCR functionality, confirm your NVIDIA CUDA drivers are properly configured by running python -c "import torch; print(torch.cuda.is_available())". If CUDA is not detected, Docling will default to CPU processing, which is functional but slower for large documents.
Use Cases
Docling is the primary tool for content acquisition when a specific URL or file path is known. Use it for:
- Converting technical documentation (PDF/DOCX) into Markdown for agent context injection.
- Extracting content from modern, complex web pages where standard scrapers fail to parse layout correctly.
- OCR extraction for scanned documents, receipts, or legal forms.
- Standardizing multi-format inputs into a unified JSON schema for downstream AI agent workflows.
- Converting presentation decks (PPTX) into text for quick summary and analysis.
Example Prompts
- "Docling, please extract the text content from this research paper URL https://example.com/report.pdf and save the output as markdown for me to review."
- "I need the table data from this document at /data/financials.docx. Use docling to convert it into a structured format I can analyze."
- "Run docling on this scanned PDF at /scans/invoice_001.pdf. Use GPU acceleration and make sure to extract all text content using OCR."
Tips & Limitations
- Prefer Docling over Web_Fetch: For targeted extraction from a known URL, Docling is far more robust against dynamic site rendering than standard fetch methods.
- Temp Storage: Always output files to a controlled directory (e.g.,
/tmp/docling_out) to prevent file system clutter. - Security: Be cautious when using the
--enable-remote-servicesflag. Only use this for trusted sources, as it may transmit data to third-party endpoints. - GPU Utilization: For large PDF datasets, always explicitly set
--device cudato drastically reduce processing latency.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-er3mit4-docling": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, network-access