Parquet Converter
Convert construction data to/from Parquet format. Optimize storage, enable fast queries, and integrate with data lakehouses.
Why use this skill?
Optimize your construction data workflows. Convert bulky CSV files to high-performance, compressed Parquet format for faster analytics and seamless lakehouse integration.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/datadrivenconstruction/parquet-converterWhat This Skill Does
The Parquet Converter is a specialized OpenClaw agent skill designed to bridge the gap between legacy construction data formats and modern, high-performance data lakehouses. It facilitates the conversion of bulky, slow-to-query formats like CSV or Excel into the Apache Parquet format. By leveraging columnar storage, Parquet provides superior compression and faster retrieval speeds, which are essential when analyzing multi-year construction project data, financial records, or project schedules. The skill handles schema definitions, partition management (e.g., partitioning by project status or type), and compression settings using robust tools like Snappy, Gzip, or Zstd.
Installation
To integrate this skill into your environment, run the following command in your terminal:
clawhub install openclaw/skills/skills/datadrivenconstruction/parquet-converter
Use Cases
- Project Analytics: Converting massive project cost trackers to Parquet to run sub-second queries on budget vs. actual cost across thousands of projects.
- Data Lakehouse Integration: Transforming site sensor data or daily logs into partitioned Parquet files for seamless ingestion into platforms like Snowflake, Databricks, or Amazon Athena.
- Storage Optimization: Drastically reducing the physical storage footprint of archival construction data without sacrificing data integrity or type definitions.
Example Prompts
- "Convert all CSV files in the /project_data/costs directory to Parquet format using Snappy compression to optimize our current storage."
- "I need to update the project data schema. Can you re-process the latest construction reports using the standard 'projects' schema and partition them by status?"
- "Summarize the conversion results for last month’s schedule data. How much space did we save by switching from CSV to Parquet?"
Tips & Limitations
- Pre-defined Schemas: Always prefer using the pre-defined schemas in the
ParquetConverterclass for consistency across your organization's datasets. - Partitioning Strategy: Be mindful of your partition columns. Choosing high-cardinality columns (like unique transaction IDs) for partitions can lead to an excessive number of small files, which negates the performance benefits of Parquet. Stick to low-cardinality metadata like 'project_status' or 'project_type'.
- Memory Usage: While Parquet is memory efficient, performing very large conversions may require sufficient RAM to handle the buffer operations before writing to disk.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-datadrivenconstruction-parquet-converter": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
data-lineage-tracker
Track data origin, transformations, and flow through construction systems. Essential for audit trails, compliance, and debugging data issues.
cwicr-cost-calculator
Calculate construction costs using DDC CWICR resource-based methodology. Break down costs into labor, materials, equipment with transparent pricing.
data-anomaly-detector
Detect anomalies and outliers in construction data: unusual costs, schedule variances, productivity spikes. Statistical and ML-based detection methods.
historical-cost-analyzer
Analyze historical construction costs for benchmarking, trend analysis, and estimating calibration. Compare projects, track escalation, identify patterns.
df-merger
Merge pandas DataFrames from multiple construction sources. Handle different schemas, keys, and data quality issues.