What This Skill Does

The news-hot-scraper skill is a powerful automation tool designed to aggregate, process, and summarize trending news topics from major Chinese platforms including Weibo, Zhihu, Bilibili, Douyin, Toutiao, Tencent News, and The Paper. It serves as a bridge between raw real-time information and structured insights, offering two distinct data acquisition modes: a high-speed API-based aggregator for rapid collection and a robust direct-scraping engine for granular content extraction. Beyond collection, the skill provides sophisticated summarization capabilities, allowing users to choose between 'extractive' summarization for quick, key-sentence highlights and 'abstractive' summarization powered by HuggingFace transformers for fluent, natural language reports. Each entry is processed with automated metadata tagging, ensuring that every result includes source platforms, publication times, and direct URLs for verification.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal:

clawhub install openclaw/skills/skills/13923870749/newsscraper

Ensure that you have Python 3.8+ installed, as the skill relies on libraries such as requests, BeautifulSoup4, and transformers for its core operations. Refer to the internal documentation for specific environment configuration.

Use Cases

Market Research: Quickly gather trending topics within specific industries (e.g., AI or Finance) to understand public sentiment.
Daily Briefing: Automate the creation of a daily morning news summary based on preferred platforms.
Competitive Intelligence: Monitor hot topics across various social media platforms simultaneously to identify viral trends.
Content Curation: Extract and summarize news for newsletters or personal knowledge management systems.

Example Prompts

"Gather the current top 20 trending topics from Weibo and Zhihu, and generate an abstractive summary of the top 5."
"Search for the latest news regarding 'Generative AI' on Toutiao and Bilibili, and output the results as a JSON file with links."
"Summarize today's hot news in the technology sector using the extractive method for a quick status update."

Tips & Limitations

Performance: For massive data requests, prioritize the API mode to avoid rate-limiting and ensure faster response times.
Ethics: Always adhere to the target website's robots.txt and include appropriate delays (1-3 seconds) to prevent service disruption. Avoid excessive request frequencies.
Data Quality: The skill includes basic filtering, but always cross-reference critical data with the original links provided in the metadata.
Customization: The news_summarizer.py script allows for model fine-tuning; users with technical knowledge can replace the default google/mt5-small-chinese model to better suit specialized domains.

news-hot-scraper

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)