ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 5/5

dataset-intake-auditor

在新数据集接入前检查字段、单位、缺失率、异常值与可用性。;use for data, dataset, audit workflows;do not use for 伪造统计结果, 替代正式数据治理平台.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/52yuanchangxing/dataset-intake-auditor
Or

What This Skill Does

The dataset-intake-auditor is an OpenClaw AI agent skill designed for automated data quality assessment. Before integrating any new dataset into your workflows, this tool performs comprehensive audits covering field definitions, measurement units, missing value percentages, outlier detection, and overall data usability. It acts as a gatekeeper to ensure that raw data is consistent, reliable, and ready for downstream processing, effectively preventing "garbage-in-garbage-out" scenarios.

Installation

To integrate this skill into your environment, run the following command in your terminal: clawhub install openclaw/skills/skills/52yuanchangxing/dataset-intake-auditor

Use Cases

This skill is ideal for data engineers, analysts, and developers who frequently handle CSV or TSV files.

  1. Pre-Ingestion Screening: Validate a new batch of data before it is imported into a production database.
  2. Documentation Generation: Automatically generate field summaries and metadata reports based on raw input files.
  3. Anomaly Detection: Identify suspicious values or structural inconsistencies that might break a pipeline.
  4. Audit Trails: Create a "review-first" report for compliance, allowing teams to audit the data before any transformations occur.

Example Prompts

  1. "Please audit the customer_churn.csv file in the data folder. I need a report on the missing value ratios for all columns and a summary of any detected outliers."
  2. "I have a new dataset provided by the marketing team. Can you use the dataset-intake-auditor to check if the currency units are consistent and identify any schema drift compared to our standard spec.json?"
  3. "Evaluate the raw_sensor_data.tsv file. Generate a draft overview and provide an ingestion recommendation based on the field quality."

Tips & Limitations

  • Safety First: The skill is designed for read-only analysis. It will not write or modify your source files.
  • Limitations: Do not use this tool as a substitute for a full-scale data governance platform or for falsifying statistical results.
  • Missing Info: If the dataset is incomplete, the tool will prioritize listing 'To-Be-Confirmed' items rather than hallucinating statistics.
  • Workflow Integration: Always review the generated "可审阅草案" (review draft) before proceeding to any automated ingestion scripts.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-52yuanchangxing-dataset-intake-auditor": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#data#dataset#audit#ingestion
Safety Score: 5/5

Flags: file-read, code-execution