ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

table-extractor

Extract tables from PDFs with high accuracy using camelot - handles complex table structures

Why use this skill?

Use the table-extractor skill to precisely parse complex PDF tables into pandas DataFrames. Supports lattice and stream methods for maximum data accuracy.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/lijie420461340/table-extractor
Or

What This Skill Does

The table-extractor skill is a powerful utility designed for OpenClaw AI agents to perform high-fidelity extraction of tabular data from complex PDF documents. Leveraging the robust camelot-py library, this skill provides precise parsing capabilities that outperform standard text-based extraction methods. It excels at identifying tables with varying structures, including those with merged cells, intricate grid layouts, or completely borderless designs. By converting unstructured PDF content into structured pandas DataFrames, the skill allows the agent to immediately process, analyze, or export the data for downstream tasks.

Installation

To integrate this skill into your environment, run the following command in your terminal within your OpenClaw project workspace:

clawhub install openclaw/skills/skills/lijie420461340/table-extractor

Ensure that you have the necessary dependencies, such as Ghostscript, installed on your system, as camelot relies on it for PDF processing. Once installed, the skill becomes immediately available for your agent to invoke via natural language commands.

Use Cases

  • Financial Reporting: Extract quarterly revenue tables or balance sheets from multi-page PDFs to automate financial modeling.
  • Scientific Research: Capture complex data sets from academic papers where tables lack clear borders but require accurate row-column alignment.
  • Invoice Processing: Pull line-item details from borderless or grid-based invoices for accounting automation.
  • Data Migration: Convert legacy PDF documentation into CSV or Excel formats for modern database ingestion.

Example Prompts

  • "Extract all tables from the attached document and summarize the quarterly revenue found in the first table."
  • "Get the table on page 5 of this report and convert it into a format I can copy into Excel."
  • "Process this PDF using the stream method to capture the borderless tables located in the appendix."

Tips & Limitations

For best results, always determine if your table is bordered or borderless before running the extraction; this determines whether you use the 'lattice' or 'stream' flavor. While highly accurate, the skill is dependent on the visual structure of the PDF. Scanned PDFs (images only) without embedded text layers may require an OCR preprocessing step prior to using this skill. If a table spans multiple pages, be sure to utilize the page range parameters to ensure the entire dataset is captured cohesively. Remember that the accuracy of the output can be refined by manually specifying 'table_areas' if the automatic detection encounters noise from surrounding text.

Metadata

Stars1656
Views1
Updated2026-02-28
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-lijie420461340-table-extractor": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#table#extraction#camelot#pdf#data
Safety Score: 4/5

Flags: file-read, code-execution