PDF OCR using Gemini LLM
Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/ashtonizmev/geminipdfocrWhat This Skill Does
The PDF OCR using Gemini LLM skill enables OpenClaw agents to perform high-fidelity optical character recognition (OCR) on PDF documents by leveraging Google Gemini's advanced multimodal vision capabilities. Unlike traditional OCR tools that often struggle with complex layouts, skewed scans, or handwritten notes, this skill processes each page of a PDF as an image, allowing Gemini to accurately interpret text, tables, and formatting. The tool automates the splitting of multi-page PDFs into individual components, uploads them securely to the Google API, and synthesizes the extracted content into readable text or structured JSON data for further processing.
Installation
To integrate this skill into your environment, navigate to your OpenClaw directory and execute the following command: clawhub install openclaw/skills/skills/ashtonizmev/geminipdfocr. Once installed, set up your local workspace by creating a virtual environment within the skill folder: cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt. Finally, ensure the GOOGLE_API_KEY is exported in your environment variables to authorize the API requests.
Use Cases
This skill is ideal for digitizing physical paperwork, processing legacy invoices, extracting data from scanned reports, or converting non-selectable text PDFs into actionable machine-readable formats. It is particularly effective for documents containing mixed elements like diagrams, handwritten annotations, and standard text blocks that would otherwise require manual entry.
Example Prompts
- "OpenClaw, perform OCR on the scanned invoice located at /documents/invoices/inv_2023_09.pdf and save the results as a JSON file."
- "Extract the text from the first five pages of /downloads/research_paper.pdf to help me summarize the findings."
- "Run an OCR scan on /uploads/handwritten_notes.pdf and give me the output in a clean, plain text format."
Tips & Limitations
To manage costs and processing time, use the --max-pages flag when testing or working with exceptionally large documents. Remember that this tool sends file content to an external API; do not process highly sensitive or private information without ensuring compliance with your internal data security policies. For best results, ensure your PDF files are not password-protected before attempting to process them. Use the --json flag if you intend to pipe the output into other programmatic workflows or data analysis tools for post-processing.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-ashtonizmev-geminipdfocr": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, external-api