ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

Wecom Doc Fetcher

Skill by mouzhi

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/mouzhi/wecom-doc-fetcher
Or

wecom-doc-fetcher

Use this skill when the user wants to save any page from the WeChat Work (企业微信) developer documentation site (developer.work.weixin.qq.com/document/path/*) as a clean Markdown file in their Obsidian vault.

Files in this skill

wecom-doc-fetcher/
├── SKILL.md          # this file
└── wx_doc_fetch.py   # the fetch & convert script

Setup (one-time)

Run these once before using the skill:

pip install requests playwright
playwright install chromium

playwright install chromium downloads a ~150 MB headless Chromium binary. This is required for automatic doc_id detection.

Python 3.8+ is required.


Usage

Place wx_doc_fetch.py anywhere convenient (e.g. your vault's scripts folder), then run:

# Basic: auto-detect doc_id, print to stdout
python wx_doc_fetch.py <URL>

# Save to file
python wx_doc_fetch.py <URL> output.md

# Skip Playwright, supply doc_id manually
python wx_doc_fetch.py <URL> output.md --doc-id <integer>

# Override cookies at runtime
python wx_doc_fetch.py <URL> output.md --cookies "wwapidoc.sid=xxx; ..."

Example

python wx_doc_fetch.py https://developer.work.weixin.qq.com/document/path/94677 发送消息.md
# [info] path_id=94677  doc_id=31152
# [done] 已写入:发送消息.md

How It Works

The WeChat Work docs site is a Vue SPA — the visible content is not in the initial HTML. It is loaded at runtime via a private POST API:

POST https://developer.work.weixin.qq.com/docFetch/fetchCnt?lang=zh_CN&ajax=1&f=json
Body: doc_id=<integer>   (application/x-www-form-urlencoded)

The response includes data.content_md — the page content as a Markdown string. The script fetches this field, cleans it, and writes the result.

Why not WebFetch / defuddle?

The page renders client-side. WebFetch and defuddle only see the pre-JS HTML skeleton — no content. Scraping innerText via browser tools works but produces a very large accessibility tree with poor formatting. The content_md API field is the cleanest, most token-efficient source.

URL path ID ≠ doc_id

The number in the browser URL (e.g. 94677) is a routing slug — not the doc_id the API needs. The actual doc_id (e.g. 31152) is determined at runtime by loading the page with Playwright and intercepting the fetchCnt XHR request.


Manual doc_id Fallback

If Playwright is unavailable or times out:

  1. Open the target URL in Chrome
  2. DevTools → Network tab → filter by fetchCnt
  3. Click the request → Payload tab
  4. Read the doc_id value
  5. Pass it with --doc-id:
python wx_doc_fetch.py https://developer.work.weixin.qq.com/document/path/94677 发送消息.md --doc-id 31152

Cookie Configuration

The fetchCnt API requires an authenticated session. Playwright's headless browser obtains session cookies automatically when loading the page — no manual cookie setup needed for normal use.

Metadata

Author@mouzhi
Stars1401
Views0
Updated2026-02-24
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-mouzhi-wecom-doc-fetcher": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.