dataset-intake-auditor
在新数据集接入前检查字段、单位、缺失率、异常值与可用性。;use for data, dataset, audit workflows;do not use for 伪造统计结果, 替代正式数据治理平台.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/52yuanchangxing/dataset-intake-auditorWhat This Skill Does
The dataset-intake-auditor is an OpenClaw AI agent skill designed for automated data quality assessment. Before integrating any new dataset into your workflows, this tool performs comprehensive audits covering field definitions, measurement units, missing value percentages, outlier detection, and overall data usability. It acts as a gatekeeper to ensure that raw data is consistent, reliable, and ready for downstream processing, effectively preventing "garbage-in-garbage-out" scenarios.
Installation
To integrate this skill into your environment, run the following command in your terminal:
clawhub install openclaw/skills/skills/52yuanchangxing/dataset-intake-auditor
Use Cases
This skill is ideal for data engineers, analysts, and developers who frequently handle CSV or TSV files.
- Pre-Ingestion Screening: Validate a new batch of data before it is imported into a production database.
- Documentation Generation: Automatically generate field summaries and metadata reports based on raw input files.
- Anomaly Detection: Identify suspicious values or structural inconsistencies that might break a pipeline.
- Audit Trails: Create a "review-first" report for compliance, allowing teams to audit the data before any transformations occur.
Example Prompts
- "Please audit the
customer_churn.csvfile in the data folder. I need a report on the missing value ratios for all columns and a summary of any detected outliers." - "I have a new dataset provided by the marketing team. Can you use the
dataset-intake-auditorto check if the currency units are consistent and identify any schema drift compared to our standardspec.json?" - "Evaluate the
raw_sensor_data.tsvfile. Generate a draft overview and provide an ingestion recommendation based on the field quality."
Tips & Limitations
- Safety First: The skill is designed for read-only analysis. It will not write or modify your source files.
- Limitations: Do not use this tool as a substitute for a full-scale data governance platform or for falsifying statistical results.
- Missing Info: If the dataset is incomplete, the tool will prioritize listing 'To-Be-Confirmed' items rather than hallucinating statistics.
- Workflow Integration: Always review the generated "可审阅草案" (review draft) before proceeding to any automated ingestion scripts.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-52yuanchangxing-dataset-intake-auditor": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: file-read, code-execution
Related Skills
securityvitals
Security vitals checker for OpenClaw. Scans your installation, scores your setup, and shows you exactly what to fix. First scan in seconds.
sealvera
Tamper-evident audit trail for AI agent decisions. Use when logging LLM decisions, setting up AI compliance, auditing agents for EU AI Act, HIPAA, GDPR or SOC 2, or when a user asks about AI decision audit trails, explainability, or SealVera.
doc-gap-finder
扫描文档目录、标题结构与文件分布,找缺失章节、重复内容和过时区域。;use for docs, audit, knowledge workflows;do not use for 读取无权限目录, 直接修改原文档.
cron-job-guardian
检查 cron 或 timer 配置中的频率、幂等、重试、日志与并发风险。;use for cron, timer, ops workflows;do not use for 直接启停生产任务, 替代真正监控.
scrapebadger
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API