product-doc-reader
产品工程图纸结构化提取器 v5.0。pdftotext 优先 + Vision 兜底,支持软连字符清理/跨行关联/数据校验。专为 Farreach 线材产品图纸设计。
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/cjboy007/ssa-product-doc-readerWhat This Skill Does
The product-doc-reader is a specialized AI agent skill designed for precision extraction of structured data from complex product engineering drawings (PDFs). Specifically engineered for Farreach cable and connector product lines, this tool utilizes a hybrid v5.0 processing architecture. It prioritizes the pdftotext extraction method for high-fidelity alphanumeric data preservation, while employing Gemini 2.5 Flash Vision API as a robust fallback for layout understanding, dimension tables, and visual schematics. The skill includes advanced features such as automatic cleaning of soft hyphens (\xad), multi-line string association for length values, and rigorous heuristic-based data filtering to remove irrelevant noise like electrical parameters (e.g., 300V), cable specifications, and internal coding schemas.
Installation
To install this skill, use the OpenClaw CLI hub command:
clawhub install openclaw/skills/skills/cjboy007/ssa-product-doc-reader
Ensure you have the following system dependencies installed for optimal performance:
- Python 3 (runtime environment)
poppler(runbrew install popplerforpdftoppmsupport)tesseract(optional, for OCR fallback support)
Use Cases
- Automated BOM Generation: Rapidly extract Bill of Materials from multi-page PDFs to populate ERP or internal database fields.
- Engineering Knowledge Base: Convert unstructured PDF diagrams into machine-readable JSON and Markdown files for seamless integration with tools like Obsidian.
- Quality Assurance & Comparison: Detect variations between drawing versions by comparing extracted specification matrices.
- Custom Template Handling: Adapt proprietary formats (e.g., C331 templates) into standard database schemas for consistent downstream processing.
Example Prompts
- "Analyze the technical drawing at
/downloads/wire_spec_599.pdfand extract all items into a JSON format for my product database." - "Use the product-doc-reader to scan this PDF and highlight the model number and packaging specification without including electrical voltage ratings."
- "Extract the test requirement table from the attached engineering diagram and output the results as a clean Markdown table."
Tips & Limitations
- Hybrid Strategy: Always prefer the default hybrid mode; it balances speed (text extraction) with intelligence (Vision API for complex tables).
- Data Integrity: The tool automatically filters out common noise such as 'BJ' packaging codes and pure length values to ensure you only get the relevant product attributes.
- Performance: For high-volume batch processing, use the
--text-onlyflag to significantly reduce API costs and compute time if the PDFs are text-based layers. - Limitations: Note that performance is highly dependent on PDF quality. Scanned, low-resolution PDFs may require the
--vision-onlymode and increased DPI settings to maintain high confidence scores.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-cjboy007-ssa-product-doc-reader": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, external-api, code-execution
Related Skills
logistics
物流管理技能,提供提单生成、报关单据生成、物流跟踪等功能。支持 OKKI 客户数据同步和自动化文档处理。
okki-email-sync
Synchronize email activities and quotation events with OKKI CRM as follow-up trail records. Automatically matches emails to CRM customers via domain lookup and vector search, creates trail records (email type=102, quotation type=101), and deduplicates entries. Requires OKKI CRM API access and optional vector search setup. Use when you need to automatically log email communications and quotation events in your CRM.
follow-up-engine
Automated customer follow-up scheduling and execution engine for B2B sales. Generates personalized follow-up email drafts based on customer stage, last contact date, and follow-up strategy. Integrates with CRM systems (configurable) to sync follow-up records. Use when you need to automate outbound sales follow-ups, schedule reminders, or generate follow-up email content for dormant leads.
报价单工作流
自动化生成报价单(Excel/Word/HTML/PDF),集成数据验证防止示例数据,支持 OKKI CRM
auto-evolution
Multi-agent auto-evolution system — orchestrate review-execute-audit loops with 4 roles (Coordinator, Reviewer, Executor, Auditor). A single coordinator agent drives the loop by spawning sub-agents for review, execution, and audit. Break goals into subtasks, auto-iterate with dual quality gates, and auto-package results. Use when: user wants autonomous task execution with built-in quality assurance.