ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

invoice-extractor

Extract structured data from invoices and receipts (PDFs and images). Output JSON, CSV, or build a running expense ledger. Use when someone shares an invoice to process, asks to track expenses, categorize spending, or prepare tax documents.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/99rebels/rebels-invoice-extractor
Or

Invoice Extractor 📄

Turn invoices and receipts into structured expense data. Extract from PDFs and images, auto-categorize spending, and maintain a running CSV ledger.

Hybrid approach: A Python script handles PDF text extraction and ledger management, while you (the agent) parse the invoice content — LLMs understand varied formats far better than regex.


When to Use

  • "Extract data from this invoice"
  • "Track my expenses" / "Add to my expense ledger"
  • "Categorize this receipt"
  • "Process these invoices" / "Batch process receipts"
  • "Show me my spending summary"
  • "Prepare tax documents" / "Get my expenses for April"

Setup

pip install pdfplumber
# Fallback: PyPDF2 (auto-used if pdfplumber unavailable)

Script: scripts/extract.py (relative to this skill directory) Config: expense-config.json (same directory)


⚡ Single Invoice Workflow

PDF Invoices

python3 scripts/extract.py pdf <file-path>

Read the output text, parse it into structured JSON (see schema below), then confirm with the user before adding to ledger.

Image Invoices (jpg, png, webp, gif)

Use the image tool with a prompt like: "Extract all invoice/receipt data from this image. Return vendor, invoice number, date, line items, subtotal, tax, total, and currency."

Parse the result into structured JSON, then confirm with the user before adding to ledger.

🔒 Confirm Then Add

Always present extracted data for user review before writing to the ledger:

📋 Invoice Extracted
Vendor: Amazon
Date: 2026-04-01
Invoice #: INV-2026-001
Description: Office supplies — keyboard and monitor
Total: €539.96 (incl. €100.97 tax)
Category: office (auto)

Add to ledger? (yes/edit/skip)

Format output for the current channel — adapt formatting to match what the platform supports. See references/formatting.md for platform-specific examples.

On confirmation, write the JSON to a temp file and run:

python3 scripts/extract.py ledger add /tmp/invoice-entry.json

Or pipe via stdin:

echo '<json>' | python3 scripts/extract.py ledger add -

If the user says "edit", modify the requested fields and re-confirm. If "skip", discard.


📦 Batch Processing

python3 scripts/extract.py batch <folder-path>
  1. Run the batch command to get a JSON list of all PDFs and images
  2. Process each file one at a time (PDFs via pdf command, images via image tool)
  3. Collect all results — do NOT confirm each one individually
  4. Present a summary of ALL extracted data at the end
  5. Ask the user to confirm once: add all, edit specific entries, or skip

Show this summary after processing all files:

📦 Batch Results — 8 files processed

Metadata

Author@99rebels
Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-99rebels-rebels-invoice-extractor": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

agent-portability-checker

Audit agent skills for platform lock-in and cross-agent compatibility. Use when checking if a skill is portable, making a skill work across multiple agents (OpenClaw, Claude Code, Codex, etc.), fixing hardcoded paths, or preparing a skill for multi-platform distribution. Checks for hardcoded platform paths, missing env var support, and platform-specific dependencies.

99rebels 4473

github-growth-tracker

Track GitHub repo growth (stars, forks, issues, commits) with periodic digests and trend analysis. Compare your repos against a watchlist. Use when checking repo stats, monitoring growth, setting up a github digest, comparing repos, or managing a repo watchlist. Requires GITHUB_TOKEN environment variable or GitHub PAT (see Credentials).

99rebels 4473

gmail-checker

Check Gmail for unread inbox emails, filtered by priority. Use when asked to check emails, check inbox, email digest, email summary, or "any new mail". Outputs a brief list sorted by priority (HIGH/MEDIUM/LOW). Skips marketing, promotions, social, and update categories. Configurable via gmail-config.json.

99rebels 4473

skill-polisher

Polish a skill's SKILL.md for ClawHub readability without sacrificing LLM effectiveness. Use when improving a skill's listing, making a skill look better on ClawHub, or preparing a skill for publish. Rewrites SKILL.md with better formatting, then audits the changes to ensure nothing the LLM needs was lost. Moved content goes to references/ — never deleted.

99rebels 4190