What This Skill Does

The xiaohongshu-extract skill provides a robust mechanism for parsing metadata from Xiaohongshu (XHS) URLs. By programmatically accessing the target URL and parsing the window.__INITIAL_STATE__ object embedded within the page source, this skill extracts rich information including note titles, descriptions, engagement metrics, user profiles, and video stream details. It is designed to act as a data bridge, turning unstructured web page content into a clean, structured JSON format that is perfect for downstream analysis, content aggregation, or automated reporting. The tool supports advanced formatting options, such as outputting flattened data records or custom JSON error reports, making it highly flexible for both individual researchers and automated pipeline integrations.

Installation

To integrate this skill into your environment, use the OpenClaw command-line interface. Run the following command in your terminal:

clawhub install openclaw/skills/skills/jovijovi/xiaohongshu-extract

Ensure that you have the necessary Python environment dependencies configured to execute the script effectively, as the skill relies on web scraping libraries to retrieve data from XHS servers.

Use Cases

Content Aggregation: Collect engagement data (likes, collections, shares) for a specific set of XHS notes to track performance over time.
Metadata Analysis: Automatically extract tags and descriptions to perform sentiment analysis or keyword mapping for viral marketing trends.
Asset Archiving: Retrieve video stream URLs and user information to facilitate the building of an offline media archive or content portfolio.
Research: Quickly extract data from URLs provided by users to answer questions about specific notes without having to manually parse page source code.

Example Prompts

"Analyze this note URL: https://www.xiaohongshu.com/explore/xxxxxx and give me the engagement stats and the author's nickname."
"Can you extract the video URL and technical specs from this XHS link? I need to know the resolution and duration."
"Fetch the full metadata for this XHS note and provide it as a flattened JSON object for my spreadsheet input."

Tips & Limitations

URL Format: Always prefer discovery URLs over share URLs for higher reliability. If the script fails to parse the initial state, verify that the link is public and accessible.
Rate Limiting: Be aware that excessive scraping may trigger anti-bot measures on the XHS platform. Use this tool responsibly within the platform's terms of service.
Dynamic Content: Some pages may load content dynamically via JavaScript. If you receive an error, ensure the URL is valid and the page is currently active.

xiaohongshu-extract

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)