Back to Registry
View Author Profile
Official Verified
etl-generator
大数据 ETL 流程生成器 - 根据源表 DDL 生成标准化 ETL 加工 SQL(HiveSQL/MySQL)
skill-install — Terminal
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/alexmayanjun-collab/etl-generatorOr
ETL 流程生成器 - 大数据专家版
根据源表 DDL 自动生成标准化的 ETL 加工 SQL,支持 HiveSQL、MySQL、ODPS。
🎯 角色定位
大数据专家(20 年经验)
- 精通 HiveSQL、MySQL、Shell、Python
- 严格遵守大数据 ETL 加工规范
- 注意字段类型转换、时区处理、数据质量
🔧 核心功能
1️⃣ 表名规范
- 源表:
ods_[表名]_di - 目标表:
dwd_[表名]_di
2️⃣ 字段类型转换
_at或_time结尾的 TIMESTAMP 字段 → STRING(时区转换)_date结尾的字段 → STRING(不转换)- 其他字段 → 保持原类型
3️⃣ 时区转换
DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS created_at
DATE_FORMAT(FROM_UTC_TIMESTAMP(updated_at, "${timezone}"), "yyyy-MM-dd HH:mm:ss.SSS") AS updated_at
4️⃣ 分区字段
DATE_FORMAT(FROM_UTC_TIMESTAMP(created_at, "${timezone}"), "yyyy-MM-dd") AS ds
5️⃣ 增量数据处理
- 使用
ods_data_base_di表 - 支持 INSERT/UPDATE/DELETE 操作
- 通过
_operation_和_after_image_识别
6️⃣ 去重逻辑
ROW_NUMBER() OVER(PARTITION BY id ORDER BY updated_at DESC) as rn
WHERE rn = 1
7️⃣ 字段排除 rn
SELECT `(rn)?+.+` FROM (...)
📋 使用方式
方式 1:命令行
# 从文件读取 DDL
python3 skills/etl-generator/etl_generator.py source_table.ddl > etl_sql.sql
# 从标准输入读取
cat source_table.ddl | python3 skills/etl-generator/etl_generator.py > etl_sql.sql
方式 2:直接调用
from etl_generator import parse_table_ddl, generate_target_table_ddl, generate_etl_sql
ddl = """
CREATE TABLE IF NOT EXISTS ods_delivery_attempt_di(
id STRING COMMENT '主键',
pno STRING COMMENT '运单号',
client_id STRING COMMENT '客户 ID',
returned BIGINT COMMENT '是否退货件',
delivery_date STRING COMMENT '派送日期',
marker_id BIGINT COMMENT '标记原因',
store_id STRING COMMENT '网点 ID',
created_at TIMESTAMP COMMENT '创建时间',
updated_at TIMESTAMP COMMENT '更新时间'
)
PARTITIONED BY (ds STRING)
STORED AS ALIORC
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="有效尝试派送详情")
LIFECYCLE 36500;
"""
table_name, fields, table_comment = parse_table_ddl(ddl)
target_ddl = generate_target_table_ddl(table_name, fields, table_comment)
etl_sql = generate_etl_sql(table_name, fields, table_comment)
📝 输出示例
输入(源表 DDL)
CREATE TABLE IF NOT EXISTS ods_sap_store_cash_pay_info_di(
id STRING COMMENT "主键",
store_id STRING COMMENT "网点编号",
business_date STRING COMMENT "业务日期",
sap_state BIGINT COMMENT "0:待处理 1:待发送 2:不需要发送 3:已发送 4:异常",
created_at TIMESTAMP COMMENT "创建时间",
updated_at TIMESTAMP COMMENT "更新时间'
)
PARTITIONED BY (ds STRING)
STORED AS ALIORC
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="SAP 门店现金支付信息")
LIFECYCLE 36500;
输出(目标表 DDL + ETL SQL)
-- 目标表 DDL
CREATE TABLE IF NOT EXISTS dwd_sap_store_cash_pay_info_di(
id STRING COMMENT '主键',
store_id STRING COMMENT '网点编号',
business_date STRING COMMENT '业务日期',
sap_state BIGINT COMMENT '0:待处理 1:待发送 2:不需要发送 3:已发送 4:异常',
created_at STRING COMMENT '创建时间',
updated_at STRING COMMENT '更新时间'
)
PARTITIONED BY (ds STRING)
STORED AS ALIORC
TBLPROPERTIES ("columnar.nested.type"="true", "comment"="SAP 门店现金支付信息")
LIFECYCLE 36500;
Metadata
AI Skill Finder
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skill Add to Configuration
Paste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-alexmayanjun-collab-etl-generator": {
"enabled": true,
"auto_update": true
}
}
}Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.
Related Skills
aliyun-codeup
阿里云云效 Codeup 代码仓库管理 - 查询项目、分支、提交记录等
alexmayanjun-collab 4473
Dataworks Daily Monitor
Skill by alexmayanjun-collab
alexmayanjun-collab 4473
dataworks-smart-monitor
DataWorks 智能监控技能 - 异步分析任务运行情况,智能告警分级(不阻塞主会话)
alexmayanjun-collab 4473
dev-machine-cleanup
开发机清理技能 - 异步清理日志和临时文件(不阻塞主会话)
alexmayanjun-collab 4473
Dev Machine Database
Skill by alexmayanjun-collab
alexmayanjun-collab 4473