Official Verified browser automation Safety 4/5

web-scraping

Web scraping tools for fetching and extracting data from web pages

Why use this skill?

Learn how to use the OpenClaw web-scraping skill to fetch, extract, and research data from the web. Master tools for URL parsing and link discovery.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/paulgnz/xpr-web-scraping

Download Source Code (.zip)

What This Skill Does

The web-scraping skill empowers your OpenClaw agent to interact directly with the live web. It provides a robust set of tools for fetching content from URLs, extracting relevant data, and discovering navigational structures across multiple domains. Whether you are analyzing a single technical document, tracking news updates across various sources, or performing complex research that requires cross-referencing multiple URLs, this skill provides the necessary interface to convert raw HTML into clean, usable formats like text or markdown. By handling the complexities of fetching, deduplicating links, and filtering content, the agent can focus on synthesizing information rather than wrestling with raw markup.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/paulgnz/xpr-web-scraping This installs the necessary dependencies authored by paulgnz from the official openclaw/skills repository.

Use Cases

Market Research: Gather data from various competitor websites concurrently using scrape_multiple to identify trends.
Document Retrieval: Use extract_links with regex patterns to isolate and download all PDF reports from a corporate investor relations page.
Content Summarization: Fetch a long-form article using scrape_url in markdown format to maintain context while generating a concise summary.
Database Population: Extract structured data points from a series of product pages to create a CSV file for your project.

Example Prompts

"Scrape the documentation pages at the following three URLs and summarize the key security updates for each: [URL1, URL2, URL3]."
"Go to the official project website, extract all links ending in .pdf, and compile a list of their titles and absolute URLs."
"Fetch the content of this landing page as markdown and tell me what the primary call-to-action is."

Tips & Limitations

Rate Limiting: Adhere to a maximum of 5 requests per minute per domain to ensure reliability and respect server resources.
Format Selection: Always choose format="text" for heavy analysis to save tokens, and use format="markdown" only when preserving headers and structure is vital.
Data Size: Be aware that individual page content is capped at 5MB. If a page exceeds this, you may need to focus the agent on specific sub-pages.
Persistence: Always pair the scraping results with store_deliverable to ensure the scraped data persists as evidence after the session ends.

Read Full Documentation on GitHub

Metadata

Author@paulgnz

Stars1217

Updated2026-02-20

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-paulgnz-xpr-web-scraping": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#scraping#automation#research#data-extraction

Safety Score: 4/5

Flags: network-access, data-collection

Related Skills

governance

XPR Network governance — communities, proposals, voting on the gov contract

paulgnz 1217

lending

LOAN Protocol lending and borrowing on XPR Network (lending.loan contract)

paulgnz 1217

nft

Full AtomicAssets/AtomicMarket NFT lifecycle on XPR Network

paulgnz 1217

xpr-agent-operator

Operate an autonomous AI agent on XPR Network's trustless registry

paulgnz 1217

code-sandbox

Execute JavaScript code in a sandboxed VM for data processing and computation

paulgnz 1217