pdf-extraction
Extract text, tables, and metadata from PDFs using pdfplumber
Why use this skill?
Efficiently extract text, tables, and metadata from complex PDF files using the pdf-extraction skill. Perfect for automating document workflows.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/lijie420461340/pdf-extractionWhat This Skill Does
The pdf-extraction skill for OpenClaw is a powerful utility designed for developers and data analysts who need to programmatically interact with PDF documents. Built upon the robust pdfplumber library, this skill goes beyond simple text-to-string extraction. It provides granular access to the internal structure of PDF files, including character-level positioning, font metadata, line analysis, and complex table detection. Whether you are dealing with scanned reports, structured financial statements, or multi-column academic papers, this skill allows the OpenClaw agent to parse, interpret, and convert PDF data into machine-readable formats like CSV or structured JSON. It is an essential tool for automating document-heavy workflows that require high precision and spatial awareness.
Installation
To integrate this skill into your environment, run the following command in your terminal within the OpenClaw ecosystem:
clawhub install openclaw/skills/skills/lijie420461340/pdf-extraction
Ensure that you have the necessary system dependencies installed for pdfplumber (typically including libpoppler) to handle document rendering and extraction tasks.
Use Cases
This skill is highly versatile and serves several professional domains:
- Financial Data Processing: Automating the extraction of complex tables from bank statements or quarterly reports.
- Academic Research: Parsing large datasets or bibliographies from research papers for citation management.
- Legal Tech: Extracting specific clauses or metadata from legal contracts that maintain strict document formatting.
- Document Archiving: Converting legacy static PDF archives into clean, searchable, and processable database entries.
Example Prompts
- "Extract all tables from this quarterly financial report and save them into a structured CSV file for me."
- "Please scan pages 5 through 10 of this document and extract the text, ensuring you preserve the original layout and indentation."
- "Identify the invoice total and the recipient company name from this PDF and provide them as a JSON object."
Tips & Limitations
- Tip: Use the 'layout=True' parameter if you are dealing with multi-column PDFs; it helps maintain the logical reading order.
- Tip: If a table is not being detected correctly, inspect the 'rects' or 'lines' of the page to verify if the lines are explicitly drawn in the PDF structure.
- Limitation: This skill is primarily for text-based PDFs. It does not perform Optical Character Recognition (OCR) on image-only PDFs. For those files, consider a pre-processing step using an OCR engine before feeding the results into this skill.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-lijie420461340-pdf-extraction": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: file-read, code-execution
Related Skills
career-compass
职场罗盘 by Barry — 一站式求职辅助 Skill。整合简历解析优化、公司调研(就业向)、同城职位搜索、模拟面试四大模块。输入个人信息/简历,自动生成简历优化方向、公司调研报告、招聘表单,并可进行模拟面试。
wechat-article-export
微信公众号多功能导出工具。將公眾號文章導出為長截圖(PNG)、PDF 或 Markdown,支持任選一種或多種格式。觸發詞:「導出微信文章」、「公眾號截圖」、「文章轉PDF」、「文章轉Markdown」、「微信導出」。
DocPilot
智能文档处理专家,支持文档解析、信息抽取、文档分类
collab-to-skill
将“人类 + Agent”共同打磨出来的流程、决策与方法,提炼成可复用的 Skill。适用于把高质量协作过程从聊天/项目推进中抽取出来,沉淀为可分发的技能包。
accounting-assistant
Buchhaltungs-Automatisierung mit EÜR-Erstellung, DATEV-Export, PDF-Beleganalyse und Steuer-Vorbereitung. Ideal für Freelancer und KMU.