fetch-archive-to-lexiang
通用文章抓取与归档工具。抓取任意 URL(免费/付费/登录墙)的文章全文,转换为结构化 Markdown,并可选转存到乐享知识库。支持 Substack、Medium、知识星球等付费平台的登录态管理。支持 YouTube 视频下载(yt-dlp)、播客音频下载(小宇宙FM等)、音频转录(Whisper)、翻译(中英对照格式),并将音视频和文字稿上传乐享知识库(文字稿使用在线文档格式,支持按块编辑)。关键词触发:抓取文章、获取全文、付费文章、转存知识库、乐享、保存原文、fetch article、归档、YouTube、视频转录、字幕提取、视频下载、播客、podcast、小宇宙、xiaoyuzhou。
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/ajaxhe/fetch-archive-to-lexiangWhat This Skill Does
The fetch-archive-to-lexiang skill is an advanced, all-in-one automation agent designed for professional content curation and knowledge management. It bridges the gap between fragmented online consumption and structured personal knowledge bases. Beyond simple web scraping, it excels at handling complex scenarios, including paywalled articles, platforms with anti-bot protections like Cloudflare, and multimedia content such as YouTube videos and podcasts. By leveraging browser automation (Playwright and CDP), it ensures high-fidelity captures, including images and metadata, while offering seamless integration with LeXiang (乐享) knowledge management systems for long-term archiving.
Installation
To install this skill, use the following command in your OpenClaw terminal:
clawhub install openclaw/skills/skills/ajaxhe/fetch-archive-to-lexiang
Use Cases
- Professional Research: Automatically archive paywalled reports from sites like The Information or Caixin, preserving full formatting and images.
- Multimedia Preservation: Automatically convert YouTube technical tutorials or Podcasts from Xiaoyuzhou into searchable, translated Markdown transcripts with local media backups.
- Content Curation: Extract full-text content from newsletters (Substack) and member-only platforms, ensuring a centralized, clean backup of your reading list.
- Knowledge Base Synchronization: Regularly feed research materials directly into LeXiang documentation, categorizing content by date and source for efficient retrieval.
Example Prompts
- "Fetch this Substack article (https://substack.com/post/123) and archive it to my LeXiang knowledge base with full images."
- "Download the latest podcast from this Xiaoyuzhou URL, transcribe it using Whisper, translate it to English, and save the transcript as a structured document in LeXiang."
- "Grab this paywalled article from the tech blog using CDP mode because it's behind a Cloudflare wall, and extract the full content into a markdown file."
Tips & Limitations
- Use CDP for Security: Always use
--cdpwhen targeting sites with strict bot protection like OpenAI or LinkedIn. Ensure your local Chrome browser is running with--remote-debugging-port=9222. - Prioritize
fetch_article.py: Avoid the genericweb_fetchfor visual content.fetch_article.pyis mandatory if you need to retain images, charts, or screenshots. - Formatting: The tool automatically handles file naming by converting illegal URL characters to hyphens; ensure your system handles the generated
_meta.jsonfiles for optimal organization. - Dependencies: This tool requires local Python environment setup and persistent Chrome access for advanced authentication workflows.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-ajaxhe-fetch-archive-to-lexiang": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, code-execution
Related Skills
lexiang-knowledge-base
用于访问乐享知识库平台的专用 skill。当用户明确提到「乐享」「lexiang」「知识库」「知识」「文档」等关键词,或用户提供的链接 host 为 lexiangla.com,应优先调用本 skill。本 skill 支持:获取文档内容与元数据、搜索文档内容、查询知识库与目录结构、创建/编辑/移动文档、管理标签与评论、上传文件及维护附件等知识库操作能力。
lexiang
腾讯乐享知识库 API 集成。提供团队、知识库、知识节点、在线文档块的完整 CRUD 操作,以及通讯录管理、AI 搜索/问答、文件上传、任务管理等功能。此 skill 适用于需要通过 API 管理乐享知识库内容(创建/查询/编辑文档、搜索知识、管理团队权限等)的场景。