ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

azure-doc-ocr

Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned documents, handwritten text, CJK languages, tables, forms, invoices, receipts, ID documents, business cards, and tax forms. Uses the REST API v4.0 (2024-11-30) with prebuilt models for various document types. Triggers: OCR, text extraction, Azure Document Intelligence, PDF OCR, image OCR, scanned documents, handwriting recognition, CJK text extraction, table extraction, invoice processing, receipt scanning, ID document recognition, document parsing, form extraction, Azure Form Recognizer

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/li-hongmin/azure-doc-ocr
Or

Azure Document Intelligence OCR

Extract text and structured data from documents using Azure Document Intelligence REST API.

Quick Start

1. Environment Setup

Set your Azure Document Intelligence credentials:

export AZURE_DOC_INTEL_ENDPOINT="https://your-resource.cognitiveservices.azure.com"
export AZURE_DOC_INTEL_KEY="your-api-key"

2. Single File OCR

# Basic text extraction from PDF
python scripts/ocr_extract.py document.pdf

# Extract with layout (tables, structure)
python scripts/ocr_extract.py document.pdf --model prebuilt-layout --format markdown

# Process invoice
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json

# OCR from URL
python scripts/ocr_extract.py --url "https://example.com/document.pdf"

# Save output to file
python scripts/ocr_extract.py document.pdf --output result.txt

# Extract specific pages
python scripts/ocr_extract.py document.pdf --pages 1-3,5

3. Batch Processing

# Process all documents in a folder
python scripts/batch_ocr.py ./documents/

# Custom output directory and format
python scripts/batch_ocr.py ./documents/ --output-dir ./extracted/ --format markdown

# Use layout model with 8 workers
python scripts/batch_ocr.py ./documents/ --model prebuilt-layout --workers 8

# Filter specific extensions
python scripts/batch_ocr.py ./documents/ --ext .pdf,.png

Model Selection Guide

Document TypeRecommended ModelUse Case
General textprebuilt-readPure text extraction, any document
Structured docsprebuilt-layoutTables, forms, paragraphs, figures
Invoicesprebuilt-invoiceVendor info, line items, totals
Receiptsprebuilt-receiptMerchant, items, totals, dates
IDs/Passportsprebuilt-idDocumentIdentity documents
Business cardsprebuilt-businessCardContact information
W-2 formsprebuilt-tax.us.w2US tax documents
Insurance cardsprebuilt-healthInsuranceCard.usHealth insurance info

See references/models.md for detailed model documentation.

Supported Input Formats

  • PDF: .pdf (including scanned PDFs)
  • Images: .png, .jpg, .jpeg, .tiff, .bmp
  • URLs: Direct links to documents

Output Formats

  • text: Plain text concatenation of all extracted content
  • markdown: Structured output with headers and tables (best with layout model)
  • json: Raw API response with full extraction details

Features

  • Handwriting Recognition: Extracts handwritten text alongside printed text
  • CJK Support: Full support for Chinese, Japanese, Korean characters
  • Table Extraction: Preserves table structure (use layout model)
  • Multi-page Processing: Handles documents with multiple pages
  • Concurrent Processing: Batch script supports parallel processing
  • URL Input: Process documents directly from URLs

Environment Variables

Metadata

Stars1656
Views1
Updated2026-02-28
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-li-hongmin-azure-doc-ocr": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.