Skrape
Ethical web data extraction with robots exclusion protocol adherence, throttled scraping requests, and privacy-compliant handling ("Scrape responsibly!").
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/10oss/skrapeWhat This Skill Does
Skrape is a sophisticated, ethically-aligned web data extraction agent designed for OpenClaw. It balances the need for information retrieval with strict adherence to the robots exclusion protocol (robots.txt), Terms of Service, and evolving legal precedents like the hiQ v. LinkedIn and Van Buren v. US rulings. Skrape acts as a responsible intermediary, ensuring that data is gathered through throttled requests (with mandatory 2-3 second delays) and proper User-Agent identification. It is built to prioritize APIs where available, ensuring developers avoid invasive scraping methods when an official data channel exists.
Installation
To integrate Skrape into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/10oss/skrape
Ensure your local environment allows for outbound network requests, as the agent requires external connectivity to perform verification checks and data retrieval.
Use Cases
Skrape is best suited for:
- Market Research: Gathering public-facing product pricing or listing information to identify trends.
- Competitive Analysis: Auditing public data sets to inform business strategy without violating copyright.
- Content Aggregation: Creating curated lists of public news or factual data while providing proper source attribution.
- Legal Compliance Auditing: Automated checking of robots.txt and Terms of Service files across large domain sets.
Example Prompts
- "Skrape, check the robots.txt for example-ecommerce.com and if allowed, extract the current list of product names and prices for the electronics category."
- "Please research if there is a public API available for status.github.com. If not, safely scrape the latest service status updates, ensuring you include proper attribution."
- "Conduct a data discovery scan for public factual information regarding standard industry pricing for cloud storage, respecting all site access boundaries and throttling requests accordingly."
Tips & Limitations
To maintain high safety standards, Skrape enforces a mandatory 2-3 second delay between requests. Users should avoid requesting private, authenticated, or PII-heavy pages, as the skill is configured to trigger warnings or halt operations upon detecting restricted zones. Always prioritize official APIs; Skrape is not intended to bypass authentication walls or circumvent technical access controls. Remember that 'publicly accessible' does not always imply 'legally reusable' for all data types, especially regarding creative design and proprietary compilations. Users must implement their own data retention policies as Skrape encourages the prompt deletion of unnecessary PII to remain GDPR and CCPA compliant.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-10oss-skrape": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, data-collection