ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

etl-generator

大数据 ETL 流程生成器 - 根据源表 DDL 生成标准化 ETL 加工 SQL(HiveSQL/MySQL)

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/alexmayanjun-collab/etl-generator
Or

ETL 流程生成器 - 大数据专家版

根据源表 DDL 自动生成标准化的 ETL 加工 SQL,支持 HiveSQL、MySQL、ODPS。

🎯 角色定位

大数据专家(20 年经验)

  • 精通 HiveSQL、MySQL、Shell、Python
  • 严格遵守大数据 ETL 加工规范
  • 注意字段类型转换、时区处理、数据质量

🔧 核心功能

1️⃣ 表名规范

  • 源表: ods_[表名]_di
  • 目标表: dwd_[表名]_di

2️⃣ 字段类型转换

  • _at_time 结尾的 TIMESTAMP 字段 → STRING(时区转换)
  • _date 结尾的字段 → STRING(不转换)
  • 其他字段 → 保持原类型

3️⃣ 时区转换

DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS created_at
DATE_FORMAT(FROM_UTC_TIMESTAMP(updated_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS updated_at

4️⃣ 分区字段

DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd") AS ds

5️⃣ 增量数据处理

  • 使用 ods_data_base_di
  • 支持 INSERT/UPDATE/DELETE 操作
  • 通过 _operation__after_image_ 识别

6️⃣ 去重逻辑

ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at DESC) as rn
WHERE rn = 1

7️⃣ 字段排除 rn

SELECT `(rn)?+.+` FROM (...)

📋 使用方式

方式 1:命令行

# 从文件读取 DDL
python3 skills/etl-generator/etl_generator.py source_table.ddl > etl_sql.sql

# 从标准输入读取
cat source_table.ddl | python3 skills/etl-generator/etl_generator.py > etl_sql.sql

方式 2:直接调用

from etl_generator import parse_table_ddl, generate_target_table_ddl, generate_etl_sql

ddl = """
CREATE TABLE IF NOT EXISTS ods_delivery_attempt_di(
  id STRING COMMENT '主键',
  pno STRING COMMENT '运单号',
  client_id STRING COMMENT '客户 ID',
  returned BIGINT COMMENT '是否退货件',
  delivery_date STRING COMMENT '派送日期',
  marker_id BIGINT COMMENT '标记原因',
  store_id STRING COMMENT '网点 ID',
  created_at TIMESTAMP COMMENT '创建时间',
  updated_at TIMESTAMP COMMENT '更新时间'
) 
PARTITIONED BY (ds STRING) 
STORED AS ALIORC  
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="有效尝试派送详情") 
LIFECYCLE 36500;
"""

table_name, fields, table_comment = parse_table_ddl(ddl)
target_ddl = generate_target_table_ddl(table_name, fields, table_comment)
etl_sql = generate_etl_sql(table_name, fields, table_comment)

📝 输出示例

输入(源表 DDL)

CREATE TABLE IF NOT EXISTS ods_sap_store_cash_pay_info_di(
  id STRING COMMENT "主键",
  store_id STRING COMMENT "网点编号",
  business_date STRING COMMENT "业务日期",
  sap_state BIGINT COMMENT "0:待处理 1:待发送 2:不需要发送 3:已发送 4:异常",
  created_at TIMESTAMP COMMENT "创建时间",
  updated_at TIMESTAMP COMMENT "更新时间'
) 
PARTITIONED BY (ds STRING) 
STORED AS ALIORC 
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="SAP 门店现金支付信息") 
LIFECYCLE 36500;

输出(目标表 DDL + ETL SQL)

-- 目标表 DDL
CREATE TABLE IF NOT EXISTS dwd_sap_store_cash_pay_info_di(
  id STRING COMMENT '主键',
  store_id STRING COMMENT '网点编号',
  business_date STRING COMMENT '业务日期',
  sap_state BIGINT COMMENT '0:待处理 1:待发送 2:不需要发送 3:已发送 4:异常',
  created_at STRING COMMENT '创建时间',
  updated_at STRING COMMENT '更新时间'
) 
PARTITIONED BY (ds STRING) 
STORED AS ALIORC  
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="SAP 门店现金支付信息") 
LIFECYCLE 36500;

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-alexmayanjun-collab-etl-generator": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.