siphonclaw
Document intelligence pipeline with visual search, OCR, and field capture
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/curtisgc1/siphonclawSiphonClaw
Domain-agnostic document intelligence pipeline. Ingest PDFs, images, and spreadsheets into a searchable knowledge base with dual-track retrieval (text + visual), OCR, confidence scoring, and field capture.
Built for field service engineers, researchers, mechanics, and anyone who needs fast answers from large document collections.
What SiphonClaw Does
- Ingest documents (PDF, Excel, images, screenshots) into a local vector database with text and visual embeddings
- Search using triple hybrid retrieval: BM25 keyword matching + semantic text vectors + visual page embeddings, fused with RRF and reranked with a cross-encoder
- Identify equipment, parts, or components from photos using vision models, then search the local knowledge base
- Capture field fixes and repair notes as first-class knowledge base entries for future retrieval
- Score every response with composite confidence (retrieval + faithfulness + relevance + coverage) and footnote-style source citations
MCP Tools
SiphonClaw exposes five tools via MCP for integration with agents and other MCP-compatible clients.
siphonclaw_search
Search the knowledge base using triple hybrid retrieval (text + visual + keyword).
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
query | string | yes | Natural language search query or exact part number / error code |
top_k | integer | no | Number of results to return (default: 5, max: 20) |
filters | object | no | Metadata filters (e.g., {"source_type": "service_manual", "model": "ModelA"}) |
mode | string | no | Search mode: "hybrid" (default), "text", "visual", "keyword" |
Returns:
{
"results": [
{
"content": "Extracted text from the matching chunk or page",
"source": "ServiceManual_ModelA.pdf",
"page": 42,
"section": "4.3 Transformer Replacement",
"score": 0.92,
"match_type": "hybrid"
}
],
"confidence": 0.87,
"confidence_tier": "Confident - verify part number",
"keywords_used": ["low voltage supply", "assembly mount", "ModelA"],
"citations": ["[1] ServiceManual_ModelA, page 42", "[2] Parts Catalog PC-1102, page 15"]
}
siphonclaw_ingest
Add a document or photo to the knowledge base. Supports PDF, Excel, images (JPG/PNG), and screenshots.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
file_path | string | yes | Absolute path to the file to ingest |
source_type | string | no | Document type hint: "manual", "parts_catalog", "field_note", "photo", "other" (default: auto-detect) |
metadata | object | no | Additional metadata to attach (e.g., {"model": "ModelA", "domain": "industrial"}) |
Returns:
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-curtisgc1-siphonclaw": {
"enabled": true,
"auto_update": true
}
}
}