ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

article-tts

拍照或文字转音频:文章照片 OCR 提取文字,或直接接收文字,生成 Microsoft Edge TTS 语音,支持中英文、自动转写、语速调节、逐句拆分。| Capture article photos (OCR) or plain text, generate natural audio via Edge TTS. Bilingual support (EN/ZH), configurable speed, voice, and sentence splitting.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/54meteor/article-tts
Or

Article TTS Skill

Default Configuration

参数默认值说明
langen语言:enzh
skipConfirmationfalse是否跳过文字确认步骤
speed90%TTS 语速(--rate=-10% = 90%)
voiceen-US-EmmaNeural(英文)/ zh-CN-XiaoxiaoNeural(中文)TTS 声音
splitSentencesfalse是否生成按句拆分的音频

Supported Languages

语言OCR 语言包TTS Voice
eneng(预装)en-US-EmmaNeural
zhchi_sim(需安装)zh-CN-XiaoxiaoNeural

中文 OCR 语言包安装:

  • Linux(WSL/Debian/Ubuntu):apt-get install tesseract-ocr-chi-sim
  • macOS:brew install tesseract-lang(自带中文)
  • Windows:下载 chi_sim.traineddata 放入 Tesseract 安装目录的 tessdata 文件夹

Workflow

Input Types

  • 图片:OCR 提取文字(需要 lang 指定语言)
  • 纯文字:直接 TTS,无需 OCR

Standard Flow(默认,需确认)

图片 → OCR 提取文字 → 展示给用户确认 → 用户确认 → 生成 TTS → 发送
文字 → 直接生成 TTS → 发送

Skip-Confirmation Flow ⚠️

用户说"不需要确认"或"直接生成"时,跳过确认步骤。

⚠️ 安全提示:skipConfirmation 会跳过文字确认步骤,OCR 提取的文本(可能包含敏感信息)会直接转为音频并发送。适用于可信来源、低敏感内容。建议默认关闭(skipConfirmation: false)。

OCR Step

# 图片预处理
from PIL import Image, ImageOps
img = Image.open(image_path)
img = ImageOps.autocontrast(img.convert('L'), cutoff=10)
w, h = img.size
img = img.resize((w*4, h*4), Image.LANCZOS)
img.save('/tmp/ocr_input.jpg', quality=99)
# 英文
tesseract /tmp/ocr_input.jpg stdout -l eng --psm 4

# 中文
tesseract /tmp/ocr_input.jpg stdout -l chi_sim --psm 4

TTS Step

全文字频

uvx edge-tts \
  -t "FULL TEXT" \
  -v en-US-EmmaNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/full_article.mp3

# 中文
uvx edge-tts \
  -t "中文文字内容" \
  -v zh-CN-XiaoxiaoNeural \
  --rate=-10% \
  --write-media OUTPUT_DIR/full_article.mp3

按句拆分(仅 splitSentences=true)

import subprocess, re

def split_sentences(text, lang='en'):
    if lang == 'zh':
        # 中文按句号/感叹号/问号拆分
        sentences = re.split(r'(?<=[。!?])\s*', text)
    else:
        # 英文按 .!? 拆分
        sentences = re.split(r'(?<=[.!?])\s+', text)
    return [s.strip() for s in sentences if s.strip()]

sentences = split_sentences(text, lang=lang)
for i, sentence in enumerate(sentences, 1):
    num = str(i).zfill(2)
    voice = 'zh-CN-XiaoxiaoNeural' if lang == 'zh' else 'en-US-EmmaNeural'
    subprocess.run([
        "uvx", "edge-tts",
        "-t", sentence,
        "-v", voice,
        "--rate=-10%",
        "--write-media", f"OUTPUT_DIR/sentence_{num}.mp3"
    ])

Output Directory

/mnt/d/wslspace/workspace/articles/YYYY-MM-DD-article-slug/
├── original_text.md
├── full_article.mp3
└── sentence_01.mp3 ...

Sending via Message Channel

The agent detects the active channel from the runtime context and calls message(...) accordingly. No hardcoded channel — the agent uses whichever channel the user is currently chatting through.

Metadata

Author@54meteor
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-54meteor-article-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.