ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 3/5

scrapling

Web scraping using Scrapling — a Python framework with anti-bot bypass (Cloudflare Turnstile, fingerprint spoofing), adaptive element tracking, stealth headless browser, and full CSS/XPath extraction. Use when web_fetch fails (Cloudflare, JS-rendered pages), or when extracting structured data from websites (prices, articles, lists). Supports HTTP, stealth, and full browser modes. Source: github.com/D4Vinci/Scrapling (PyPI: scrapling). Only use on sites you have permission to scrape.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/damirikys/scrapling-fetcher
Or

What This Skill Does

The Scrapling skill provides a powerful and flexible web scraping capability, designed to overcome common challenges faced by traditional web fetchers. It utilizes the Scrapling Python framework, which offers advanced features such as anti-bot bypass mechanisms (including Cloudflare Turnstile and fingerprint spoofing), adaptive element tracking, a stealthy headless browser mode, and comprehensive CSS/XPath extraction. Scrapling supports three fetcher modes: http for standard requests with TLS fingerprint spoofing, stealth for bypassing anti-bot measures with a headless browser, and dynamic for handling JavaScript-heavy Single Page Applications (SPAs) using a full browser instance. This skill is ideal for extracting structured data like prices, articles, and lists from websites, especially when standard fetching methods fail due to Cloudflare protection or dynamic content rendering. It also offers inline Python usage for more complex scraping logic and an optional MCP server for AI-native scraping.

Installation

Before using the Scrapling skill, it requires a one-time installation of its dependencies. This process involves installing the scrapling package with all its extras and a browser binary. To proceed, the user will be prompted to confirm the installation of approximately 200 MB of data. The commands to be executed are:

pip install scrapling[all]
patchright install chromium

The scrapling[all] command installs necessary libraries including patchright (a stealth fork of Playwright) and curl_cffi. patchright install chromium downloads the Chromium browser, which is essential for the stealth and dynamic fetcher modes. This installation step ensures that the skill has all the required components to function effectively across different scraping scenarios.

Use Cases

Scrapling is particularly useful in situations where standard web requests are insufficient or blocked:

  • Cloudflare Protection: When web_fetch encounters Cloudflare challenges, 403 Forbidden, or 429 Too Many Requests errors, the --mode stealth option can be used to bypass these protections by employing a headless browser with anti-detect features. Note: This should only be done on sites where scraping is permitted.
  • JavaScript-Rendered Content: For websites that heavily rely on JavaScript to render their content (e.g., Single Page Applications), the --mode dynamic option, which uses a full browser instance, is necessary to fetch and parse the complete page.
  • Structured Data Extraction: When you need to extract specific data points like product prices, article text, or lists of items from a webpage, Scrapling's CSS selector or XPath extraction capabilities are highly effective. The --selector argument can be used to pinpoint the exact elements to extract.
  • API Fallback: If web_fetch fails due to complex site structures or anti-bot measures, Scrapling provides a robust alternative for retrieving web content.

Example Prompts

  1. "Scrape the headlines from the homepage of example.com using CSS selectors and provide the output in JSON format."
  2. "Fetch the main article text from this URL [URL] using stealth mode to bypass Cloudflare, and output it as plain text."
  3. "Extract all product prices from the category page at [URL], which is heavily JavaScript-rendered. Use the dynamic mode and output the results as a list."

Tips & Limitations

  • Permissions are Crucial: Always ensure you have explicit permission to scrape a website. Respect robots.txt and the website's Terms of Service. Unauthorized scraping can lead to legal issues and IP bans.
  • Stealth Mode Usage: Use stealth mode judiciously. It is resource-intensive and should only be employed when necessary to bypass anti-bot measures on authorized sites, not for circumventing paywalls or accessing restricted content.
  • Installation Confirmation: Be aware that the installation requires downloading significant data and browser binaries. Always confirm with the user before executing the installation commands.
  • Adaptive Scraping: For recurring scraping tasks, leverage Scrapling's adaptive scraping features (auto_save=True or adaptive) to persist element fingerprints locally, improving efficiency and resilience to minor site changes.
  • MCP Server: The scrapling mcp command starts a local network service. Only use this if explicitly needed and you trust the environment, as it exposes a local HTTP server.
  • Resource Intensive: stealth and dynamic modes are more resource-intensive than http mode due to the headless browser execution.
  • Large-Scale Crawls: For very large-scale scraping operations, consider using Scrapling's Spider API, as detailed in its official documentation.

Metadata

Author@damirikys
Stars3376
Views34
Updated2026-03-24
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-damirikys-scrapling-fetcher": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#web-scraping#data-extraction#anti-bot#cloudfare-bypass
Safety Score: 3/5

Flags: network-access, file-write, file-read, code-execution