ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

Doc Genius

Skill by imgolye

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/imgolye/doc-genius
Or

name: doc-genius version: 1.2.0 description: 智能文档处理助手,支持PDF/Word/Markdown智能摘要、格式转换、批量处理。使用场景:(1) 文档智能摘要 (2) PDF/Word转Markdown (3) 批量文档处理 (4) 文档格式转换。Triggers: "文档摘要", "PDF转Markdown", "Word转换", "文档处理", "批量文档", "智能摘要", "format conversion"。

Doc Genius - 智能文档处理助手

快速开始

智能摘要

# PDF摘要
python3 scripts/doc_processor.py summarize /path/to/document.pdf

# Word摘要
python3 scripts/doc_processor.py summarize /path/to/document.docx

# Markdown摘要
python3 scripts/doc_processor.py summarize /path/to/document.md --format json

格式转换

# PDF → Markdown
python3 scripts/doc_processor.py convert /path/to/document.pdf --output markdown

# Word → Markdown
python3 scripts/doc_processor.py convert /path/to/document.docx --output markdown

# Markdown → HTML
python3 scripts/doc_processor.py convert /path/to/document.md --output html

批量处理

# 批量转换文件夹
python3 scripts/doc_processor.py batch /path/to/folder --output markdown

# 批量摘要
python3 scripts/doc_processor.py batch /path/to/folder --action summarize

输出格式

JSON格式(默认)

{
  "file": "document.pdf",
  "type": "pdf",
  "summary": "这是文档的智能摘要...",
  "keywords": ["关键词1", "关键词2"],
  "word_count": 5000,
  "pages": 12
}

Markdown格式

python3 scripts/doc_processor.py summarize document.pdf --format markdown

核心功能

1. 智能摘要

支持格式:

  • ✅ PDF(PyPDF2)
  • ✅ Word(.docx)
  • ✅ Markdown
  • ✅ 纯文本

摘要算法:

  • 本地摘要(TextRank,速度快)
  • AI摘要(OpenAI API,质量高)

示例:

# 本地摘要
python3 scripts/doc_processor.py summarize document.pdf --method local

# AI摘要(需配置API Key)
export OPENAI_API_KEY="sk-xxx"
python3 scripts/doc_processor.py summarize document.pdf --method ai

2. 格式转换

转换矩阵:

输入格式输出格式状态
PDFMarkdown
PDFHTML⚠️ 实验性
WordMarkdown
WordHTML
MarkdownHTML
MarkdownWord🔜 计划中

示例:

# PDF → Markdown(推荐)
python3 scripts/doc_processor.py convert report.pdf --output markdown

# Word → HTML
python3 scripts/doc_processor.py convert report.docx --output html

3. 批量处理

功能:

  • 文件夹扫描
  • 并发处理
  • 进度报告
  • 错误日志

示例:

# 批量转换(默认并发数=5)
python3 scripts/doc_processor.py batch /path/to/docs --output markdown

# 指定并发数
python3 scripts/doc_processor.py batch /path/to/docs --output markdown --workers 10

# 生成报告
python3 scripts/doc_processor.py batch /path/to/docs --action summarize --report report.json

4. 结构化提取(实验性)

提取内容:

  • 标题层级
  • 目录
  • 关键信息(日期、金额、人名)

示例:

python3 scripts/doc_processor.py extract document.pdf --fields title,toc,dates

高级用法

使用AI摘要

# 配置API Key
export OPENAI_API_KEY="sk-xxx"

# AI摘要(更智能)
python3 scripts/doc_processor.py summarize document.pdf --method ai --model gpt-4

自定义输出

# 指定输出文件
python3 scripts/doc_processor.py convert document.pdf --output markdown --out-file output.md

Metadata

Author@imgolye
Stars2287
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-imgolye-doc-genius": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.