ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified file management Safety 3/5

PDF OCR using Gemini LLM

Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ashtonizmev/geminipdfocr
Or

What This Skill Does

The PDF OCR using Gemini LLM skill enables OpenClaw agents to perform high-fidelity optical character recognition (OCR) on PDF documents by leveraging Google Gemini's advanced multimodal vision capabilities. Unlike traditional OCR tools that often struggle with complex layouts, skewed scans, or handwritten notes, this skill processes each page of a PDF as an image, allowing Gemini to accurately interpret text, tables, and formatting. The tool automates the splitting of multi-page PDFs into individual components, uploads them securely to the Google API, and synthesizes the extracted content into readable text or structured JSON data for further processing.

Installation

To integrate this skill into your environment, navigate to your OpenClaw directory and execute the following command: clawhub install openclaw/skills/skills/ashtonizmev/geminipdfocr. Once installed, set up your local workspace by creating a virtual environment within the skill folder: cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt. Finally, ensure the GOOGLE_API_KEY is exported in your environment variables to authorize the API requests.

Use Cases

This skill is ideal for digitizing physical paperwork, processing legacy invoices, extracting data from scanned reports, or converting non-selectable text PDFs into actionable machine-readable formats. It is particularly effective for documents containing mixed elements like diagrams, handwritten annotations, and standard text blocks that would otherwise require manual entry.

Example Prompts

  1. "OpenClaw, perform OCR on the scanned invoice located at /documents/invoices/inv_2023_09.pdf and save the results as a JSON file."
  2. "Extract the text from the first five pages of /downloads/research_paper.pdf to help me summarize the findings."
  3. "Run an OCR scan on /uploads/handwritten_notes.pdf and give me the output in a clean, plain text format."

Tips & Limitations

To manage costs and processing time, use the --max-pages flag when testing or working with exceptionally large documents. Remember that this tool sends file content to an external API; do not process highly sensitive or private information without ensuring compliance with your internal data security policies. For best results, ensure your PDF files are not password-protected before attempting to process them. Use the --json flag if you intend to pipe the output into other programmatic workflows or data analysis tools for post-processing.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ashtonizmev-geminipdfocr": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ocr#pdf#gemini#automation#digitization
Safety Score: 3/5

Flags: file-read, file-write, external-api