Scrape
Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling.
Why use this skill?
Safely extract web data with the OpenClaw Scrape skill. Features built-in robots.txt compliance, rate limiting, and PII-stripping to ensure your scraping stays legal.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/ivangdavila/scrapeWhat This Skill Does
The Scrape skill is a robust, ethically-engineered web data extraction agent designed for the OpenClaw ecosystem. It bridges the gap between raw web access and legal compliance by automating the discovery of site policies. The skill serves as a protective layer, enforcing robots.txt adherence, managing sophisticated rate limiting, and ensuring that your data collection processes remain within the boundaries of international regulations like GDPR and CCPA. By prioritizing public data over protected resources, it helps developers and analysts build datasets without triggering the legal pitfalls associated with aggressive scraping.
Installation
To integrate the Scrape skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/ivangdavila/scrape
Use Cases
- Market Research: Extracting public-facing product pricing or listing data from e-commerce sites to perform competitive analysis.
- Content Aggregation: Summarizing public news articles or blog posts while maintaining clear audit trails of source origins.
- Lead Qualification: Harvesting public corporate directory information to populate CRM systems, provided strict PII filtering is applied.
- Academic Research: Gathering public datasets from non-authenticated domains for statistical analysis and training models.
Example Prompts
- "Scrape the public pricing table from example-store.com/products and save the data to a JSON format, ensuring you adhere to their robots.txt file first."
- "Research the latest announcements on tech-blog.org. Use the Scrape skill to pull the headlines, but ensure you strip any author email addresses to remain GDPR compliant."
- "Check if there is public contact information available for the company at industry-news.net. Please respect their rate limits and provide a log of the headers used for the request."
Tips & Limitations
The Scrape skill operates best when you provide it with clear instructions regarding the scope of the target site. Always prioritize site-provided APIs; if a site offers an official API, you must use it instead of scraping. Remember that the skill does not grant permission to bypass login walls; any attempt to access authenticated data is strictly prohibited and likely violates site terms of service. Always monitor your logs to verify that the PII-stripping features are functioning as expected, and treat the tool as a helpful assistant that requires your final approval before executing high-volume data operations.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-ivangdavila-scrape": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, data-collection
Related Skills
Animations
Create performant web animations with proper accessibility and timing.
Arduino
Develop Arduino projects avoiding common wiring, power, and code pitfalls.
Bulgarian
Write Bulgarian that sounds human. Not formal, not robotic, not AI-generated.
Arabic
Write Arabic that sounds human. Not formal, not robotic, not AI-generated.
Assistant
Manage tasks, communications, and scheduling with proactive and organized support.