table-extractor
Extract tables from PDFs with high accuracy using camelot - handles complex table structures
Why use this skill?
Use the table-extractor skill to precisely parse complex PDF tables into pandas DataFrames. Supports lattice and stream methods for maximum data accuracy.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/lijie420461340/table-extractorWhat This Skill Does
The table-extractor skill is a powerful utility designed for OpenClaw AI agents to perform high-fidelity extraction of tabular data from complex PDF documents. Leveraging the robust camelot-py library, this skill provides precise parsing capabilities that outperform standard text-based extraction methods. It excels at identifying tables with varying structures, including those with merged cells, intricate grid layouts, or completely borderless designs. By converting unstructured PDF content into structured pandas DataFrames, the skill allows the agent to immediately process, analyze, or export the data for downstream tasks.
Installation
To integrate this skill into your environment, run the following command in your terminal within your OpenClaw project workspace:
clawhub install openclaw/skills/skills/lijie420461340/table-extractor
Ensure that you have the necessary dependencies, such as Ghostscript, installed on your system, as camelot relies on it for PDF processing. Once installed, the skill becomes immediately available for your agent to invoke via natural language commands.
Use Cases
- Financial Reporting: Extract quarterly revenue tables or balance sheets from multi-page PDFs to automate financial modeling.
- Scientific Research: Capture complex data sets from academic papers where tables lack clear borders but require accurate row-column alignment.
- Invoice Processing: Pull line-item details from borderless or grid-based invoices for accounting automation.
- Data Migration: Convert legacy PDF documentation into CSV or Excel formats for modern database ingestion.
Example Prompts
- "Extract all tables from the attached document and summarize the quarterly revenue found in the first table."
- "Get the table on page 5 of this report and convert it into a format I can copy into Excel."
- "Process this PDF using the stream method to capture the borderless tables located in the appendix."
Tips & Limitations
For best results, always determine if your table is bordered or borderless before running the extraction; this determines whether you use the 'lattice' or 'stream' flavor. While highly accurate, the skill is dependent on the visual structure of the PDF. Scanned PDFs (images only) without embedded text layers may require an OCR preprocessing step prior to using this skill. If a table spans multiple pages, be sure to utilize the page range parameters to ensure the entire dataset is captured cohesively. Remember that the accuracy of the output can be refined by manually specifying 'table_areas' if the automatic detection encounters noise from surrounding text.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-lijie420461340-table-extractor": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: file-read, code-execution
Related Skills
career-compass
职场罗盘 by Barry — 一站式求职辅助 Skill。整合简历解析优化、公司调研(就业向)、同城职位搜索、模拟面试四大模块。输入个人信息/简历,自动生成简历优化方向、公司调研报告、招聘表单,并可进行模拟面试。
wechat-article-export
微信公众号多功能导出工具。將公眾號文章導出為長截圖(PNG)、PDF 或 Markdown,支持任選一種或多種格式。觸發詞:「導出微信文章」、「公眾號截圖」、「文章轉PDF」、「文章轉Markdown」、「微信導出」。
scrapebadger
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API
Spreadsheet & Data Wrangling Master
Complete spreadsheet methodology — data cleanup, transformation, analysis, dashboards, automation, and reporting. Works with CSV, Excel, Google Sheets, or any tabular data. Use when the user needs to clean messy data, build reports, create dashboards, automate recurring spreadsheet tasks, or transform data between formats.
comparison-table-gen
Auto-generates comparison tables for concepts, drugs, or study results in Markdown format.