Skrape
Ethical web data extraction with robots exclusion protocol adherence, throttled scraping requests, and privacy-compliant handling ("Scrape responsibly!").
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/10sk/skrapeWhat This Skill Does
Skrape is a sophisticated web data extraction tool built for OpenClaw that prioritizes ethical data gathering. Unlike aggressive scrapers, Skrape integrates the robots exclusion protocol (robots.txt) directly into its operational logic to ensure compliance with site-specific crawler policies. It automatically manages request throttling, maintaining a minimum 2-3 second delay between operations to reduce server load. The skill provides a robust framework for distinguishing between public factual data and sensitive personal information, ensuring users navigate the complex legal landscape of data privacy laws like GDPR and CCPA. By emphasizing source attribution and API-first discovery, Skrape enables researchers and developers to acquire the data they need while maintaining professional integrity and protecting themselves from legal risks associated with CFAA and copyright infringement.
Installation
To add the Skrape skill to your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/10sk/skrape
Ensure your agent has the necessary permissions for network requests before attempting to run your first extraction job.
Use Cases
- Market Intelligence: Safely gathering public pricing data and product listings across multiple e-commerce domains to perform competitive analysis.
- Academic Research: Extracting publicly available datasets from research portals while maintaining full logs of extraction activities for transparency and auditability.
- Content Aggregation: Programmatically checking for the availability of API endpoints as a preferred alternative to scraping, and executing controlled extractions only when authorized.
Example Prompts
- "Skrape, check the robots.txt for example-store.com, and if allowed, extract the product listing titles and prices for the current spring collection."
- "I need to pull public conference event schedules from tech-events.org. Please ensure you are throttling your requests to stay within their performance guidelines and log the activity."
- "Is there an API available for data-hub.gov? If not, perform a targeted extraction of the public statistics table while ensuring no PII is included in the output."
Tips & Limitations
To maximize the utility of Skrape, always prioritize official APIs if they are present. Skrape is designed to be a responsible citizen; it will intentionally pause or halt if it detects that a site has blocked its automated access. Remember that Skrape does not bypass login screens or access-controlled pages. Always review the data returned to ensure it does not inadvertently contain PII, and purge any unnecessary cached data to adhere to the principle of data minimization.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-10sk-skrape": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, data-collection