Official Verified file management Safety 4/5

Pdfreader

Skill by nantes

Why use this skill?

Efficiently extract text, metadata, and data from PDF documents using the Pdfreader skill for OpenClaw. Streamline document analysis and AI processing.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/nantes/pdfreader

Download Source Code (.zip)

What This Skill Does

The Pdfreader skill is a robust utility designed for OpenClaw to extract, parse, and structure textual data from PDF documents. By utilizing the PyMuPDF library, it enables the AI agent to ingest lengthy reports, academic papers, books, or technical manuals that are trapped in static PDF formats. It performs deep text extraction, handles various character encodings, and retrieves embedded document metadata such as titles, authors, and creation dates. This skill essentially transforms unstructured document files into machine-readable JSON data, making the information easily queryable and consumable by the AI agent's internal reasoning engines.

Installation

To install this skill, run the following command in your terminal: clawhub install openclaw/skills/skills/nantes/pdfreader Ensure you have the required dependency installed in your local environment by running: pip install pymupdf

Use Cases

This skill is indispensable for professionals dealing with documentation-heavy workflows. Use cases include:

Summarizing lengthy legal contracts or research papers.
Analyzing quarterly financial reports or white papers for specific data points.
Automating the conversion of scanned documents or manuals into structured text for database ingestion.
Extracting key metadata from academic archives for library management.

Example Prompts

"Pdfreader, please parse the document at ./reports/annual_2023.pdf and summarize the key findings regarding our Q4 growth metrics."
"Can you use the Pdfreader skill to extract the full text from the technical manual located at ./manuals/device_specs.pdf and format the technical requirements into a list?"
"Extract the metadata from the document at ./papers/research_v1.pdf and tell me who the author is and how many pages it contains."

Tips & Limitations

Large files: While PyMuPDF is highly efficient, processing very high-page-count PDFs may take a few extra moments. Use the page-limiting argument to target specific segments if memory usage becomes an issue.
Scanned PDFs: Note that this skill extracts embedded text layers. If your PDF is a set of raw images (scanned without OCR), you may need an additional OCR preprocessing step.
Output formats: Always prefer the --output=json flag for complex tasks, as it preserves structure better than raw console text.
Security: Only read PDFs from trusted sources to prevent malicious content from being processed by your agent's parser.

Read Full Documentation on GitHub

Metadata

Author@nantes

Stars1335

Updated2026-02-23

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-nantes-pdfreader": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf#text-extraction#document-parser#data-processing#pymupdf

Safety Score: 4/5

Flags: file-read, file-write

Related Skills

a2a-protocol

Agent2Agent (A2A) Protocol implementation - communicate with other AI agents

nantes 1335

mcp-client

Model Context Protocol (MCP) client - connect to tools, data sources and services

nantes 1335

arxiv-osiris

Search and download research papers from arXiv.org - Research version for OpenClaw agents

nantes 1335

Agent Watcher

Skill by nantes

nantes 1335

simplemem

Efficient Lifelong Memory for LLM Agents - semantic compression, cross-session memory, and intent-aware retrieval

nantes 1335