ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified productivity Safety 3/5

fetch-archive-to-lexiang

通用文章抓取与归档工具。抓取任意 URL(免费/付费/登录墙)的文章全文,转换为结构化 Markdown,并可选转存到乐享知识库。支持 Substack、Medium、知识星球等付费平台的登录态管理。支持 YouTube 视频下载(yt-dlp)、播客音频下载(小宇宙FM等)、音频转录(Whisper)、翻译(中英对照格式),并将音视频和文字稿上传乐享知识库(文字稿使用在线文档格式,支持按块编辑)。关键词触发:抓取文章、获取全文、付费文章、转存知识库、乐享、保存原文、fetch article、归档、YouTube、视频转录、字幕提取、视频下载、播客、podcast、小宇宙、xiaoyuzhou。

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ajaxhe/fetch-archive-to-lexiang
Or

What This Skill Does

The fetch-archive-to-lexiang skill is an advanced, all-in-one automation agent designed for professional content curation and knowledge management. It bridges the gap between fragmented online consumption and structured personal knowledge bases. Beyond simple web scraping, it excels at handling complex scenarios, including paywalled articles, platforms with anti-bot protections like Cloudflare, and multimedia content such as YouTube videos and podcasts. By leveraging browser automation (Playwright and CDP), it ensures high-fidelity captures, including images and metadata, while offering seamless integration with LeXiang (乐享) knowledge management systems for long-term archiving.

Installation

To install this skill, use the following command in your OpenClaw terminal: clawhub install openclaw/skills/skills/ajaxhe/fetch-archive-to-lexiang

Use Cases

  • Professional Research: Automatically archive paywalled reports from sites like The Information or Caixin, preserving full formatting and images.
  • Multimedia Preservation: Automatically convert YouTube technical tutorials or Podcasts from Xiaoyuzhou into searchable, translated Markdown transcripts with local media backups.
  • Content Curation: Extract full-text content from newsletters (Substack) and member-only platforms, ensuring a centralized, clean backup of your reading list.
  • Knowledge Base Synchronization: Regularly feed research materials directly into LeXiang documentation, categorizing content by date and source for efficient retrieval.

Example Prompts

  1. "Fetch this Substack article (https://substack.com/post/123) and archive it to my LeXiang knowledge base with full images."
  2. "Download the latest podcast from this Xiaoyuzhou URL, transcribe it using Whisper, translate it to English, and save the transcript as a structured document in LeXiang."
  3. "Grab this paywalled article from the tech blog using CDP mode because it's behind a Cloudflare wall, and extract the full content into a markdown file."

Tips & Limitations

  • Use CDP for Security: Always use --cdp when targeting sites with strict bot protection like OpenAI or LinkedIn. Ensure your local Chrome browser is running with --remote-debugging-port=9222.
  • Prioritize fetch_article.py: Avoid the generic web_fetch for visual content. fetch_article.py is mandatory if you need to retain images, charts, or screenshots.
  • Formatting: The tool automatically handles file naming by converting illegal URL characters to hyphens; ensure your system handles the generated _meta.json files for optimal organization.
  • Dependencies: This tool requires local Python environment setup and persistent Chrome access for advanced authentication workflows.

Metadata

Author@ajaxhe
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ajaxhe-fetch-archive-to-lexiang": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#scraping#knowledge-management#automation#transcription#archiving
Safety Score: 3/5

Flags: network-access, file-write, file-read, code-execution