data-quality-check
Assess construction data quality using completeness, accuracy, consistency, timeliness, and validity metrics. Automated validation with regex patterns, thresholds, and reporting.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/datadrivenconstruction/data-quality-checkData Quality Check for Construction
Overview
Based on DDC methodology (Chapter 2.6), this skill provides comprehensive data quality assessment for construction projects. Poor data quality leads to poor decisions - validate early, validate often.
Book Reference: "Требования к качеству данных и его обеспечение" / "Data Quality Requirements"
"Качество данных определяется пятью ключевыми метриками: полнота, точность, согласованность, своевременность и достоверность." — DDC Book, Chapter 2.6
Quick Start
import pandas as pd
# Load construction data
df = pd.read_excel("bim_export.xlsx")
# Quick quality check
quality_score = {
'completeness': (1 - df.isnull().sum().sum() / df.size) * 100,
'unique_ids': df['ElementId'].nunique() == len(df),
'valid_volumes': (df['Volume_m3'] >= 0).all()
}
print(f"Completeness: {quality_score['completeness']:.1f}%")
print(f"Unique IDs: {quality_score['unique_ids']}")
print(f"Valid volumes: {quality_score['valid_volumes']}")
Data Quality Dimensions
The 5 Quality Metrics
import pandas as pd
import numpy as np
import re
from datetime import datetime, timedelta
class DataQualityChecker:
"""Comprehensive data quality assessment for construction data"""
def __init__(self, df):
self.df = df.copy()
self.results = {}
self.issues = []
def check_completeness(self, required_columns=None):
"""Check for missing values (Полнота)"""
if required_columns is None:
required_columns = self.df.columns.tolist()
completeness = {}
for col in required_columns:
if col in self.df.columns:
non_null = self.df[col].notna().sum()
total = len(self.df)
completeness[col] = (non_null / total) * 100
else:
completeness[col] = 0
self.issues.append(f"Missing required column: {col}")
overall = np.mean(list(completeness.values()))
self.results['completeness'] = {
'by_column': completeness,
'overall': overall,
'threshold': 95,
'passed': overall >= 95
}
return self.results['completeness']
def check_accuracy(self, rules=None):
"""Check data accuracy against rules (Точность)"""
if rules is None:
# Default construction data rules
rules = {
'Volume_m3': {'min': 0, 'max': 10000},
'Area_m2': {'min': 0, 'max': 100000},
'Weight_kg': {'min': 0, 'max': 1000000},
'Cost': {'min': 0, 'max': 100000000}
}
accuracy = {}
for col, bounds in rules.items():
if col in self.df.columns:
valid = self.df[col].between(
bounds.get('min', -np.inf),
bounds.get('max', n...
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-datadrivenconstruction-data-quality-check": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
data-lineage-tracker
Track data origin, transformations, and flow through construction systems. Essential for audit trails, compliance, and debugging data issues.
cwicr-cost-calculator
Calculate construction costs using DDC CWICR resource-based methodology. Break down costs into labor, materials, equipment with transparent pricing.
data-anomaly-detector
Detect anomalies and outliers in construction data: unusual costs, schedule variances, productivity spikes. Statistical and ML-based detection methods.
historical-cost-analyzer
Analyze historical construction costs for benchmarking, trend analysis, and estimating calibration. Compare projects, track escalation, identify patterns.
df-merger
Merge pandas DataFrames from multiple construction sources. Handle different schemas, keys, and data quality issues.