Official Verified developer tools Safety 3/5

crawlee-web-scraper

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single URLs, bulk file input, and automatic fallback from requests to Crawlee on 403/429 responses.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bryantegomoh/crawlee-web-scraper

Download Source Code (.zip)

What This Skill Does

The crawlee-web-scraper is a powerful, resilient utility designed specifically for OpenClaw agents to bypass modern web anti-bot defenses. While standard HTTP fetchers are easily blocked by Cloudflare, reCAPTCHA, or simple rate limits, this skill leverages the Crawlee library to mimic legitimate human browser behavior. It provides two primary interfaces: a direct command-line scraper for high-volume tasks and a drop-in library helper that automatically upgrades standard requests to Crawlee sessions when 403, 429, or 503 errors are detected. It supports full HTML capture, automated text extraction, and batch processing via text files, returning clean, structured JSON output ready for further agent processing.

Installation

To integrate this skill into your environment, run the following command in your terminal:

clawhub install openclaw/skills/skills/bryantegomoh/crawlee-web-scraper

Ensure you have the required Python dependencies installed globally or in your agent's virtual environment:

pip install crawlee requests

Use Cases

Bypassing Bot Protection: Use this when a target website uses Cloudflare, Datadome, or similar providers that block standard requests.
Bulk Data Collection: Efficiently scrape lists of URLs from a file without worrying about aggressive rate-limiting causing your agent to stall.
Resilient Pipelines: Integrate into existing workflows where you want to start with a lightweight request but automatically fallback to a robust scraper if the target server rejects the initial connection.
Clean Data Extraction: Quickly strip boilerplate HTML to get straight to the readable text content for LLM ingestion.

Example Prompts

"Use crawlee-web-scraper to fetch the latest tech news from these 50 URLs listed in tech_sites.txt and save the clean text to news_data.json."
"I'm getting a 403 Forbidden error when trying to access the documentation page at https://target-site.com/api. Can you switch to the crawlee-web-scraper to bypass this check?"
"Scrape the content of https://example.com/pricing and extract only the main body text so I can summarize their subscription tiers."

Tips & Limitations

Performance: Crawlee is resource-intensive compared to basic requests. Only use it when standard fetches fail or are likely to fail.
Rate Limiting: Even with bot evasion, respect robots.txt and avoid hitting servers with excessive concurrency that could be perceived as a DDoS attack.
Execution Time: Because this often spins up a browser instance or simulates complex handshakes, individual requests may take significantly longer than standard HTTP GET calls.
Memory: When scraping large bulk lists, ensure your system has enough memory, as concurrent browser instances can consume significant RAM.

Read Full Documentation on GitHub

Metadata

Author@bryantegomoh

Stars4190

Updated2026-04-18

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bryantegomoh-crawlee-web-scraper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#web-scraping#automation#bot-evasion#data-extraction#crawlee

Safety Score: 3/5

Flags: network-access, file-write, file-read

Related Skills

gateway-watchdog

Production-grade bash watchdog for the OpenClaw gateway. Runs via launchd every 5 minutes. Handles boot grace periods, progressive retry with backoff, port-level fallback checks, stale PID detection, and restart cooldowns — preventing restart loops while keeping the gateway reliably alive.

bryantegomoh 4190

dronemobile

Control vehicles via DroneMobile (Firstech/Compustar remote start systems). Use when the user asks to start their car, stop the engine, lock/unlock doors, open the trunk, check battery voltage, or get vehicle status. Triggers on phrases like "start my car", "remote start", "lock my car", "unlock the car", "check battery", "open trunk", "stop the engine", "vehicle status". Requires DRONEMOBILE_EMAIL and DRONEMOBILE_PASSWORD environment variables. Optionally DRONEMOBILE_DEVICE_KEY for multi-vehicle accounts.

bryantegomoh 4190

content-security-filter

Prompt injection and malware detection filter for external content. Scans text, files, or URLs for 20+ attack patterns including instruction overrides, credential exfiltration, persona hijacking, encoded payloads, fake system messages, and invisible character injection. Returns JSON with risk level and sanitized text.

bryantegomoh 4190