article-extract
提取微信公众号、博客、新闻等网页的正文内容,绕过反爬机制,纯文本输出。
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/caozeal/article-extractWhat This Skill Does
The article-extract skill is a specialized utility designed for OpenClaw users to effortlessly strip away the noise of modern web pages. It focuses on extracting the core textual content from WeChat official account articles, technical blogs, and news websites. Unlike standard scrapers, this tool is optimized to bypass common anti-scraping mechanisms found on platforms like WeChat, ensuring that users receive clean, readable text without the clutter of navigation bars, advertisement sidebars, or tracking scripts. The output is provided as raw text, making it perfect for feeding into LLMs for summarization, analysis, or knowledge base archiving.
Installation
To integrate this skill into your environment, use the OpenClaw command line interface. Ensure you have a standard Python 3.6+ environment configured on your host machine. Run the following command in your terminal:
clawhub install openclaw/skills/skills/caozeal/article-extract
Once installed, the tool is immediately available to be invoked by the OpenClaw agent or through direct command line execution via python3 skills/article-extract/scripts/extract.py.
Use Cases
This skill is highly versatile for information management. Content creators can use it to gather research materials for blog posts or newsletters. Researchers can automate the collection of large volumes of long-form articles for trend analysis. Developers can utilize it to parse documentation or news feeds into formats suitable for training local models. It is also an excellent tool for personal knowledge management, allowing you to convert ephemeral web links into static, plain-text backups.
Example Prompts
- "Extract the content from this WeChat article: https://mp.weixin.qq.com/s/example123 and summarize the key technical points."
- "Use the article-extract tool on https://example.com/blog/deep-learning and save the plain text to a file named research_notes.txt."
- "Please read the provided URL and tell me if it contains relevant information regarding the new OpenAI update: https://tech-news.com/article/openai-latest."
Tips & Limitations
For optimal results, ensure the provided URL is publicly accessible; this tool does not support pages behind paywalls, login screens, or those requiring session cookies. While the tool is highly effective at cleaning HTML, it relies on static content delivery; therefore, pages that are rendered entirely through client-side JavaScript (e.g., dynamic single-page applications) may not yield full content. It is recommended to use standard URLs rather than shortened redirect links to minimize potential resolution errors. Keep in mind that as a pure Python utility, it operates with high performance and privacy by avoiding reliance on heavy external headless browsers.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-caozeal-article-extract": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read