Official Verified data analysis Safety 4/5

paddleocr-text-recognition

Use this skill whenever the user wants text extracted from images, photos, scans, screenshots, or scanned PDFs. Returns exact machine-readable strings with line-level text and optional bbox coordinates. Strong accuracy for CJK, small print, and handwritten text. Trigger terms: OCR, 文字识别, 图片转文字, 截图识字, 提取图中文字, 扫描识字, 识字, 纯文字, plain text extraction, 坐标, 检测框, bbox, bounding box, image to text, screenshot, photo scan, recognize text.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bobholamovic/paddleocr-text-recognition

Download Source Code (.zip)

What This Skill Does

The paddleocr-text-recognition skill serves as a high-performance interface for optical character recognition (OCR) within the OpenClaw environment. It leverages the robust PaddleOCR engine to extract printed or handwritten text, along with coordinate-based spatial information, from images and PDF documents. This skill is designed to act as an automated vision bridge, enabling the agent to interpret complex visual inputs such as invoices, screenshots, or scanned documents without requiring native image interpretation logic. The tool manages the execution flow via a dedicated Python wrapper, ensuring that data is processed, parsed, and presented back to the agent in a structured JSON format.

Installation

To integrate this skill into your OpenClaw agent, execute the following command in your terminal: clawhub install openclaw/skills/skills/bobholamovic/paddleocr-text-recognition Once the repository is synced, navigate to the skill directory (skills/paddleocr-text-recognition) and prepare your environment by running the dependency installer: pip install -r scripts/requirements.txt

Use Cases

This skill is indispensable for workflows requiring data extraction from non-textual files. Ideal scenarios include:

Converting image-based receipts or invoices into structured machine-readable text for expense reporting.
Digitizing historical scans or images of documents that lack underlying text layers.
Extracting specific fields, tables, or text blocks from structured forms located in images or PDF files.
Automating the ingestion of data from URLs pointing to visual content, such as web-based product specifications or diagrams.

Example Prompts

"Please extract all the text from this invoice image and save it as a JSON object so I can process the total amount."
"I've uploaded a screenshot of a technical error log. Use PaddleOCR to read the text and highlight any lines containing 'Critical Failure'."
"Go to this URL: https://example.com/receipt.pdf, run OCR on the document, and list all line items found in the table."

Tips & Limitations

Direct API Execution: Always utilize the scripts/ocr_caller.py script provided. Do not attempt to process images using alternate methods.
Error Handling: If the API returns an error, display it clearly to the user. Do not attempt to use built-in vision capabilities as a fallback, as this violates strict operational protocols.
File Management: By default, results are saved in the system temp directory. Utilize the --stdout flag if you prefer immediate parsing without persistence, or define a specific path with --output for better file management.
Input Precision: Ensure the file path or URL provided is accessible and correctly formatted to avoid API timeouts or connection failures.

Read Full Documentation on GitHub

Metadata

Author@bobholamovic

Stars4190

Updated2026-04-18

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bobholamovic-paddleocr-text-recognition": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ocr#image-processing#text-recognition#data-extraction#paddleocr

Safety Score: 4/5

Flags: file-read, file-write, code-execution

Related Skills

paddleocr-doc-parsing

Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.

bobholamovic 4190