ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified file management Safety 4/5

pdf-cn

PDF 文档处理 | PDF Document Processing. 读取、提取、合并、分割 PDF | Read, extract, merge, split PDFs. 支持文本提取、表格识别、注释 | Supports text extraction, table recognition, annotations. 触发词:PDF、pdf.

Why use this skill?

Efficiently extract, merge, split, and parse PDF documents with the pdf-cn skill. Perfect for automating document workflows and data extraction tasks.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/guohongbin-git/pdf-cn
Or

What This Skill Does

The pdf-cn skill is a comprehensive toolkit for manipulating PDF documents directly within the OpenClaw environment. It provides essential programmatic interfaces for extracting raw text, identifying and parsing complex tables into structured data, merging multiple PDF files, splitting documents into individual pages, and rotating page orientations. Built upon robust libraries like pypdf, pdfplumber, and reportlab, this skill empowers users to automate document workflows that would otherwise require manual intervention or expensive proprietary software. Whether you are dealing with research papers, financial reports, or standardized forms, pdf-cn acts as a reliable intermediary to transform static PDF layouts into dynamic, machine-readable formats like Excel or plain text.

Installation

To integrate this skill into your OpenClaw agent, use the official installation command provided by the repository:

clawhub install openclaw/skills/skills/guohongbin-git/pdf-cn

Ensure that you have the necessary Python dependencies installed in your environment, particularly pypdf, pdfplumber, and pandas if you intend to perform advanced table-to-Excel data extraction.

Use Cases

  • Data Digitization: Extract tabular data from scanned reports or invoices into structured CSV/Excel files for analysis.
  • Document Management: Programmatically combine individual PDF documents into a single master report or split large multi-page manuals into searchable, single-page files.
  • Information Extraction: Automate the retrieval of metadata or specific text sections from high-volume document archives.
  • Reporting Automation: Create new PDF reports from scratch using custom layouts and text data retrieved during the agent execution flow.

Example Prompts

  1. "PDF: Extract all tables from the attached document and save them into a file named summary.xlsx."
  2. "pdf: Split the file 'Project_Specs.pdf' into individual pages and rename them based on the page number."
  3. "PDF: Merge the three files 'part1.pdf', 'part2.pdf', and 'part3.pdf' into one complete document called 'Final_Project.pdf'."

Tips & Limitations

  • OCR Support: This skill works best with text-based PDFs. If you are processing image-based scanned documents, you may need an additional OCR engine (like Tesseract) for accurate text extraction.
  • Complex Layouts: While table extraction is highly efficient, tables with merged cells or complex grid lines may require manual data validation post-extraction.
  • Memory Usage: When merging or splitting extremely large documents, ensure your system has sufficient RAM to handle the document buffer.

Metadata

Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-guohongbin-git-pdf-cn": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pdf#automation#document-processing#data-extraction#productivity
Safety Score: 4/5

Flags: file-write, file-read, code-execution