What This Skill Does

The pdf-tools skill provides a comprehensive suite of Python-based utilities designed to handle Portable Document Format (PDF) files with high precision. It empowers OpenClaw agents to perform complex operations including raw text extraction, structural metadata analysis, file merging, page-level splitting, rotation, and targeted text overlaying. By leveraging mature libraries like pdfplumber and PyPDF2, the skill allows the agent to read and modify PDF documents programmatically, making it an essential component for automated document management, report generation, and data processing workflows.

Installation

To integrate this skill, first ensure you have Python 3 installed. The toolset requires specific dependencies to function correctly. Execute the following command in your terminal:

pip3 install pdfplumber PyPDF2

After installing the requirements, install the skill into the OpenClaw environment using the agent's package management interface:

clawhub install openclaw/skills/skills/cmpdchtr/pdf-tools

Use Cases

Document Digitization and Summarization: Automatically extract text from long reports to feed into LLMs for summarization or analysis.
File Organization: Break down large binders or multi-chapter documents into individual page files for archiving or easier navigation.
Compliance and Reporting: Batch process documents by applying headers, footers, or watermarks via text overlays, or merge multiple individual receipts/invoices into a single document for submission.
PDF Inspection: Verify file integrity and structure by reviewing metadata and page counts before processing.

Example Prompts

"Please extract the text from the first three pages of report.pdf and save it to a new file named summary.txt."
"I need you to take invoice.pdf, rotate pages 2 and 4 by 90 degrees, and merge the result with supplemental_info.pdf into a final document called output.pdf."
"Add the word 'DRAFT' as an overlay at coordinates (100, 100) on the first page of contract.pdf."

Tips & Limitations

Text-Based vs. Scanned: This skill is optimized for text-based PDFs. If you are working with scanned image-based PDFs, text extraction might not yield reliable results without OCR integration.
Editing Complexity: While text overlays are robust, full text replacement within existing PDF layers can be delicate. Always work on a copy of your source file to avoid accidental data corruption.
Indexing: All page numbers follow a 1-indexed system (starting at 1), so ensure your input parameters align with standard document layouts.
File Safety: The tools perform validation checks to ensure files exist, but verify your paths before running batch operations to prevent script failures.

pdf-tools

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)