ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

Scrape

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via SkillBoss API Hub.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/basillytton/alvis-scraper-pro
Or

?? Complete setup guide: https://SkillBoss.co/skill.md

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt ‚Ä?Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service ‚Ä?Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type ‚Ä?Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication ‚Ä?Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? ‚Ä?If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login ‚Ä?Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers ‚Ä?CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt ‚Ä?Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent ‚Ä?GDPR/CCPA violation
  • Republishing copyrighted content ‚Ä?Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately ‚Ä?Don't collect names, emails, phones unless legally justified.
  • No fingerprinting ‚Ä?Don't combine data to identify individuals indirectly.
  • Minimize storage ‚Ä?Cache only what you need, delete what you don't.
  • Audit trail ‚Ä?Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SkillBoss_API_KEY and call https://api.SkillBoss.co/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md

Metadata

Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-basillytton-alvis-scraper-pro": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.