ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

Habib Pdf To Json

Extract structured data from construction PDFs. Convert specifications, BOMs, schedules, and reports from PDF to Excel/CSV/JSON. Use OCR for scanned documents and pdfplumber for native PDFs.

Why use this skill?

Habib Pdf To Json extracts structured data from construction BOMs, specs, and reports. Convert PDF to Excel, CSV, or JSON using OCR and pdfplumber with OpenClaw.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/dbmoradi60/habib-pdf-to-json
Or

What This Skill Does

Habib Pdf To Json is a specialized OpenClaw skill designed to bridge the gap between unstructured construction documentation and machine-readable data formats. Construction projects are notorious for managing vast amounts of data trapped in PDF documents—ranging from complex Bills of Quantities (BOMs) and material specifications to project schedules and progress reports. This skill leverages powerful processing libraries like pdfplumber for native, text-based PDFs and integrates Tesseract OCR for digitized or scanned images of blueprints and reports. By converting these documents into Excel, CSV, or JSON, users can facilitate seamless data integration with BIM software, project management tools, or automated analytical pipelines, strictly following the DDC (Data-Driven Construction) methodology outlined in Chapter 2.4 of the professional data engineering standards.

Installation

To integrate this capability into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/dbmoradi60/habib-pdf-to-json Ensure your local environment has the necessary Python dependencies configured, including pdfplumber for document parsing, pandas for data manipulation, and Tesseract OCR if you intend to process scanned PDF documents.

Use Cases

  • Automated BOM Analysis: Extracting thousands of items from a Bill of Quantities PDF into a JSON format for immediate budget calculation.
  • Regulatory Compliance: Converting legacy scanned specification sheets into searchable CSV files to ensure project materials meet current building codes.
  • Progress Reporting: Parsing monthly PDF project reports into structured Excel files to feed into automated progress tracking dashboards.

Example Prompts

  1. "Open the file 'Structural_BOM.pdf' in the downloads folder, extract all tables, and save the result as a structured JSON file for my ERP system."
  2. "Look through 'Electrical_Specs.pdf' and convert the material requirements table into a CSV format, ensuring the headers are mapped correctly."
  3. "Process the scanned 'Site_Inspection_Report.pdf' using OCR, extract the key metrics, and generate a summary report in Excel."

Tips & Limitations

For optimal results, ensure the PDF source documents are clear and have a consistent table structure. While native PDFs parsed via pdfplumber are highly accurate, scanned documents relying on OCR are subject to the quality of the original image—ensure high-resolution scans for best output quality. Always review extracted numerical data for potential parsing errors common in OCR processes when dealing with complex formatting or rotated text.

Metadata

Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-dbmoradi60-habib-pdf-to-json": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#construction#data-extraction#ocr#pdf-processing#automation
Safety Score: 4/5

Flags: file-read, file-write, code-execution