ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

invoice-extractor

Extract invoice information from images and PDF files using Baidu OCR API, export to Excel. Supports single file, multiple files, or entire directory processing. Use when the user mentions invoices, invoice recognition, extracting invoice data, processing receipts, converting invoices to Excel, or batch processing invoice files.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/aitanjp/invoice-extractor
Or

Invoice Extractor

Extract invoice information from images (PNG, JPG) and PDF files, then export to Excel format.

Capabilities

  • Multi-format support: PNG, JPG, JPEG, BMP, TIFF, PDF
  • High accuracy: Uses Baidu OCR API specialized for invoice recognition
  • Complete fields: Extracts all invoice fields including buyer/seller info, amounts, items
  • Excel export: Formatted Excel output with summary and detail sheets
  • Flexible input: Single file, multiple files, or entire directory processing
  • Batch processing: Process hundreds of invoices in one command
  • Preview mode: List files before processing

Prerequisites

  1. Baidu Cloud OCR API credentials (free tier: 50,000 requests/day)
  2. Python environment with required packages

Quick Start

1. Setup Baidu OCR

Get API credentials from https://cloud.baidu.com/product/ocr:

  1. Register/login to Baidu Cloud
  2. Create an application
  3. Get API Key and Secret Key

2. Configure

Create config.txt in the project root:

BAIDU_API_KEY=your_api_key_here
BAIDU_SECRET_KEY=your_secret_key_here

Or run the setup wizard:

python main_baidu.py --setup

3. Run

Process a single file:

python main_baidu.py -f invoice.pdf

Process multiple files:

python main_baidu.py -f invoice1.pdf -f invoice2.png

Process entire directory:

python main_baidu.py -i ./fp

Mixed mode (directory + extra files):

python main_baidu.py -i ./fp -f extra_invoice.pdf

Output will be saved to output/ directory as Excel file.

Workflow

Task Progress:
- [ ] Check prerequisites (Baidu API credentials)
- [ ] Choose input method (single file / multiple files / directory)
- [ ] Scan and collect invoice files
- [ ] Preview files (optional with --list)
- [ ] Process each file with Baidu OCR
- [ ] Parse invoice fields
- [ ] Export to Excel
- [ ] Verify output

Input Methods

Single File

Process one specific invoice file:

python main_baidu.py -f invoice.pdf
python main_baidu.py -f "path/to/invoice.png"

Multiple Files

Process several specific files:

python main_baidu.py -f file1.pdf -f file2.png -f file3.jpg

Entire Directory

Process all invoice files in a directory (recursive):

python main_baidu.py -i ./my_invoices
python main_baidu.py -i "/path/to/invoice/folder"

Mixed Mode

Combine directory and individual files:

python main_baidu.py -i ./fp -f ./extra/invoice.pdf

Preview Mode

List files without processing:

python main_baidu.py -i ./fp --list

Extracted Fields

Basic Information

  • Invoice code (发票代码)
  • Invoice number (发票号码)
  • Invoice date (开票日期)
  • Invoice type (发票类型)

Buyer Information

  • Name (购买方名称)
  • Tax number (纳税人识别号)
  • Address and phone (地址电话)
  • Bank account (开户行及账号)

Seller Informa...

Metadata

Author@aitanjp
Stars4473
Views2
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-aitanjp-invoice-extractor": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.