web-scraping
Web scraping tools for fetching and extracting data from web pages
Why use this skill?
Learn how to use the OpenClaw web-scraping skill to fetch, extract, and research data from the web. Master tools for URL parsing and link discovery.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/paulgnz/xpr-web-scrapingWhat This Skill Does
The web-scraping skill empowers your OpenClaw agent to interact directly with the live web. It provides a robust set of tools for fetching content from URLs, extracting relevant data, and discovering navigational structures across multiple domains. Whether you are analyzing a single technical document, tracking news updates across various sources, or performing complex research that requires cross-referencing multiple URLs, this skill provides the necessary interface to convert raw HTML into clean, usable formats like text or markdown. By handling the complexities of fetching, deduplicating links, and filtering content, the agent can focus on synthesizing information rather than wrestling with raw markup.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/paulgnz/xpr-web-scraping
This installs the necessary dependencies authored by paulgnz from the official openclaw/skills repository.
Use Cases
- Market Research: Gather data from various competitor websites concurrently using
scrape_multipleto identify trends. - Document Retrieval: Use
extract_linkswith regex patterns to isolate and download all PDF reports from a corporate investor relations page. - Content Summarization: Fetch a long-form article using
scrape_urlin markdown format to maintain context while generating a concise summary. - Database Population: Extract structured data points from a series of product pages to create a CSV file for your project.
Example Prompts
- "Scrape the documentation pages at the following three URLs and summarize the key security updates for each: [URL1, URL2, URL3]."
- "Go to the official project website, extract all links ending in .pdf, and compile a list of their titles and absolute URLs."
- "Fetch the content of this landing page as markdown and tell me what the primary call-to-action is."
Tips & Limitations
- Rate Limiting: Adhere to a maximum of 5 requests per minute per domain to ensure reliability and respect server resources.
- Format Selection: Always choose
format="text"for heavy analysis to save tokens, and useformat="markdown"only when preserving headers and structure is vital. - Data Size: Be aware that individual page content is capped at 5MB. If a page exceeds this, you may need to focus the agent on specific sub-pages.
- Persistence: Always pair the scraping results with
store_deliverableto ensure the scraped data persists as evidence after the session ends.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-paulgnz-xpr-web-scraping": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, data-collection
Related Skills
governance
XPR Network governance — communities, proposals, voting on the gov contract
lending
LOAN Protocol lending and borrowing on XPR Network (lending.loan contract)
nft
Full AtomicAssets/AtomicMarket NFT lifecycle on XPR Network
xpr-agent-operator
Operate an autonomous AI agent on XPR Network's trustless registry
code-sandbox
Execute JavaScript code in a sandboxed VM for data processing and computation