Official Verified data analysis Safety 4/5

Parquet Converter

Convert construction data to/from Parquet format. Optimize storage, enable fast queries, and integrate with data lakehouses.

Why use this skill?

Optimize your construction data workflows. Convert bulky CSV files to high-performance, compressed Parquet format for faster analytics and seamless lakehouse integration.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/datadrivenconstruction/parquet-converter

Download Source Code (.zip)

What This Skill Does

The Parquet Converter is a specialized OpenClaw agent skill designed to bridge the gap between legacy construction data formats and modern, high-performance data lakehouses. It facilitates the conversion of bulky, slow-to-query formats like CSV or Excel into the Apache Parquet format. By leveraging columnar storage, Parquet provides superior compression and faster retrieval speeds, which are essential when analyzing multi-year construction project data, financial records, or project schedules. The skill handles schema definitions, partition management (e.g., partitioning by project status or type), and compression settings using robust tools like Snappy, Gzip, or Zstd.

Installation

To integrate this skill into your environment, run the following command in your terminal: clawhub install openclaw/skills/skills/datadrivenconstruction/parquet-converter

Use Cases

Project Analytics: Converting massive project cost trackers to Parquet to run sub-second queries on budget vs. actual cost across thousands of projects.
Data Lakehouse Integration: Transforming site sensor data or daily logs into partitioned Parquet files for seamless ingestion into platforms like Snowflake, Databricks, or Amazon Athena.
Storage Optimization: Drastically reducing the physical storage footprint of archival construction data without sacrificing data integrity or type definitions.

Example Prompts

"Convert all CSV files in the /project_data/costs directory to Parquet format using Snappy compression to optimize our current storage."
"I need to update the project data schema. Can you re-process the latest construction reports using the standard 'projects' schema and partition them by status?"
"Summarize the conversion results for last month’s schedule data. How much space did we save by switching from CSV to Parquet?"

Tips & Limitations

Pre-defined Schemas: Always prefer using the pre-defined schemas in the ParquetConverter class for consistency across your organization's datasets.
Partitioning Strategy: Be mindful of your partition columns. Choosing high-cardinality columns (like unique transaction IDs) for partitions can lead to an excessive number of small files, which negates the performance benefits of Parquet. Stick to low-cardinality metadata like 'project_status' or 'project_type'.
Memory Usage: While Parquet is memory efficient, performing very large conversions may require sufficient RAM to handle the buffer operations before writing to disk.

Read Full Documentation on GitHub

Metadata

Author@datadrivenconstruction

Stars1100

Updated2026-02-17

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-datadrivenconstruction-parquet-converter": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#parquet#construction#data-engineering#analytics#storage

Safety Score: 4/5

Flags: file-write, file-read, code-execution

Related Skills

data-lineage-tracker

Track data origin, transformations, and flow through construction systems. Essential for audit trails, compliance, and debugging data issues.

datadrivenconstruction 3376

cwicr-cost-calculator

Calculate construction costs using DDC CWICR resource-based methodology. Break down costs into labor, materials, equipment with transparent pricing.

datadrivenconstruction 3376

data-anomaly-detector

Detect anomalies and outliers in construction data: unusual costs, schedule variances, productivity spikes. Statistical and ML-based detection methods.

datadrivenconstruction 3376

historical-cost-analyzer

Analyze historical construction costs for benchmarking, trend analysis, and estimating calibration. Compare projects, track escalation, identify patterns.

datadrivenconstruction 3376

df-merger

Merge pandas DataFrames from multiple construction sources. Handle different schemas, keys, and data quality issues.

datadrivenconstruction 3376