ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

product-doc-reader

产品工程图纸结构化提取器 v5.0。pdftotext 优先 + Vision 兜底,支持软连字符清理/跨行关联/数据校验。专为 Farreach 线材产品图纸设计。

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/cjboy007/ssa-product-doc-reader
Or

What This Skill Does

The product-doc-reader is a specialized AI agent skill designed for precision extraction of structured data from complex product engineering drawings (PDFs). Specifically engineered for Farreach cable and connector product lines, this tool utilizes a hybrid v5.0 processing architecture. It prioritizes the pdftotext extraction method for high-fidelity alphanumeric data preservation, while employing Gemini 2.5 Flash Vision API as a robust fallback for layout understanding, dimension tables, and visual schematics. The skill includes advanced features such as automatic cleaning of soft hyphens (\xad), multi-line string association for length values, and rigorous heuristic-based data filtering to remove irrelevant noise like electrical parameters (e.g., 300V), cable specifications, and internal coding schemas.

Installation

To install this skill, use the OpenClaw CLI hub command: clawhub install openclaw/skills/skills/cjboy007/ssa-product-doc-reader

Ensure you have the following system dependencies installed for optimal performance:

  1. Python 3 (runtime environment)
  2. poppler (run brew install poppler for pdftoppm support)
  3. tesseract (optional, for OCR fallback support)

Use Cases

  • Automated BOM Generation: Rapidly extract Bill of Materials from multi-page PDFs to populate ERP or internal database fields.
  • Engineering Knowledge Base: Convert unstructured PDF diagrams into machine-readable JSON and Markdown files for seamless integration with tools like Obsidian.
  • Quality Assurance & Comparison: Detect variations between drawing versions by comparing extracted specification matrices.
  • Custom Template Handling: Adapt proprietary formats (e.g., C331 templates) into standard database schemas for consistent downstream processing.

Example Prompts

  1. "Analyze the technical drawing at /downloads/wire_spec_599.pdf and extract all items into a JSON format for my product database."
  2. "Use the product-doc-reader to scan this PDF and highlight the model number and packaging specification without including electrical voltage ratings."
  3. "Extract the test requirement table from the attached engineering diagram and output the results as a clean Markdown table."

Tips & Limitations

  • Hybrid Strategy: Always prefer the default hybrid mode; it balances speed (text extraction) with intelligence (Vision API for complex tables).
  • Data Integrity: The tool automatically filters out common noise such as 'BJ' packaging codes and pure length values to ensure you only get the relevant product attributes.
  • Performance: For high-volume batch processing, use the --text-only flag to significantly reduce API costs and compute time if the PDFs are text-based layers.
  • Limitations: Note that performance is highly dependent on PDF quality. Scanned, low-resolution PDFs may require the --vision-only mode and increased DPI settings to maintain high confidence scores.

Metadata

Author@cjboy007
Stars3562
Views0
Updated2026-03-29
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-cjboy007-ssa-product-doc-reader": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#engineering#pdf-parser#automation#data-extraction#manufacturing
Safety Score: 4/5

Flags: file-read, file-write, external-api, code-execution

Related Skills

logistics

物流管理技能,提供提单生成、报关单据生成、物流跟踪等功能。支持 OKKI 客户数据同步和自动化文档处理。

cjboy007 3562

okki-email-sync

Synchronize email activities and quotation events with OKKI CRM as follow-up trail records. Automatically matches emails to CRM customers via domain lookup and vector search, creates trail records (email type=102, quotation type=101), and deduplicates entries. Requires OKKI CRM API access and optional vector search setup. Use when you need to automatically log email communications and quotation events in your CRM.

cjboy007 3562

follow-up-engine

Automated customer follow-up scheduling and execution engine for B2B sales. Generates personalized follow-up email drafts based on customer stage, last contact date, and follow-up strategy. Integrates with CRM systems (configurable) to sync follow-up records. Use when you need to automate outbound sales follow-ups, schedule reminders, or generate follow-up email content for dormant leads.

cjboy007 3562

报价单工作流

自动化生成报价单(Excel/Word/HTML/PDF),集成数据验证防止示例数据,支持 OKKI CRM

cjboy007 3562

auto-evolution

Multi-agent auto-evolution system — orchestrate review-execute-audit loops with 4 roles (Coordinator, Reviewer, Executor, Auditor). A single coordinator agent drives the loop by spawning sub-agents for review, execution, and audit. Break goals into subtasks, auto-iterate with dual quality gates, and auto-package results. Use when: user wants autonomous task execution with built-in quality assurance.

cjboy007 3562