news-hot-scraper
This skill should be used when users need to scrape hot news topics from Chinese platforms (微博、知乎、B站、抖音、今日头条、腾讯新闻、澎湃新闻), generate summaries, and cite sources. It supports both API-based and direct scraping methods, and offers both extractive and abstractive summarization techniques.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/13923870749/newsscraperWhat This Skill Does
The news-hot-scraper skill is a powerful automation tool designed to aggregate, process, and summarize trending news topics from major Chinese platforms including Weibo, Zhihu, Bilibili, Douyin, Toutiao, Tencent News, and The Paper. It serves as a bridge between raw real-time information and structured insights, offering two distinct data acquisition modes: a high-speed API-based aggregator for rapid collection and a robust direct-scraping engine for granular content extraction. Beyond collection, the skill provides sophisticated summarization capabilities, allowing users to choose between 'extractive' summarization for quick, key-sentence highlights and 'abstractive' summarization powered by HuggingFace transformers for fluent, natural language reports. Each entry is processed with automated metadata tagging, ensuring that every result includes source platforms, publication times, and direct URLs for verification.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/13923870749/newsscraper
Ensure that you have Python 3.8+ installed, as the skill relies on libraries such as requests, BeautifulSoup4, and transformers for its core operations. Refer to the internal documentation for specific environment configuration.
Use Cases
- Market Research: Quickly gather trending topics within specific industries (e.g., AI or Finance) to understand public sentiment.
- Daily Briefing: Automate the creation of a daily morning news summary based on preferred platforms.
- Competitive Intelligence: Monitor hot topics across various social media platforms simultaneously to identify viral trends.
- Content Curation: Extract and summarize news for newsletters or personal knowledge management systems.
Example Prompts
- "Gather the current top 20 trending topics from Weibo and Zhihu, and generate an abstractive summary of the top 5."
- "Search for the latest news regarding 'Generative AI' on Toutiao and Bilibili, and output the results as a JSON file with links."
- "Summarize today's hot news in the technology sector using the extractive method for a quick status update."
Tips & Limitations
- Performance: For massive data requests, prioritize the API mode to avoid rate-limiting and ensure faster response times.
- Ethics: Always adhere to the target website's
robots.txtand include appropriate delays (1-3 seconds) to prevent service disruption. Avoid excessive request frequencies. - Data Quality: The skill includes basic filtering, but always cross-reference critical data with the original links provided in the metadata.
- Customization: The
news_summarizer.pyscript allows for model fine-tuning; users with technical knowledge can replace the defaultgoogle/mt5-small-chinesemodel to better suit specialized domains.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-13923870749-newsscraper": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api, code-execution