scrapling
Web scraping using Scrapling — a Python framework with anti-bot bypass (Cloudflare Turnstile, fingerprint spoofing), adaptive element tracking, stealth headless browser, and full CSS/XPath extraction. Use when web_fetch fails (Cloudflare, JS-rendered pages), or when extracting structured data from websites (prices, articles, lists). Supports HTTP, stealth, and full browser modes. Source: github.com/D4Vinci/Scrapling (PyPI: scrapling). Only use on sites you have permission to scrape.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/damirikys/scrapling-fetcherWhat This Skill Does
The Scrapling skill provides a powerful and flexible web scraping capability, designed to overcome common challenges faced by traditional web fetchers. It utilizes the Scrapling Python framework, which offers advanced features such as anti-bot bypass mechanisms (including Cloudflare Turnstile and fingerprint spoofing), adaptive element tracking, a stealthy headless browser mode, and comprehensive CSS/XPath extraction. Scrapling supports three fetcher modes: http for standard requests with TLS fingerprint spoofing, stealth for bypassing anti-bot measures with a headless browser, and dynamic for handling JavaScript-heavy Single Page Applications (SPAs) using a full browser instance. This skill is ideal for extracting structured data like prices, articles, and lists from websites, especially when standard fetching methods fail due to Cloudflare protection or dynamic content rendering. It also offers inline Python usage for more complex scraping logic and an optional MCP server for AI-native scraping.
Installation
Before using the Scrapling skill, it requires a one-time installation of its dependencies. This process involves installing the scrapling package with all its extras and a browser binary. To proceed, the user will be prompted to confirm the installation of approximately 200 MB of data. The commands to be executed are:
pip install scrapling[all]
patchright install chromium
The scrapling[all] command installs necessary libraries including patchright (a stealth fork of Playwright) and curl_cffi. patchright install chromium downloads the Chromium browser, which is essential for the stealth and dynamic fetcher modes. This installation step ensures that the skill has all the required components to function effectively across different scraping scenarios.
Use Cases
Scrapling is particularly useful in situations where standard web requests are insufficient or blocked:
- Cloudflare Protection: When
web_fetchencounters Cloudflare challenges, 403 Forbidden, or 429 Too Many Requests errors, the--mode stealthoption can be used to bypass these protections by employing a headless browser with anti-detect features. Note: This should only be done on sites where scraping is permitted. - JavaScript-Rendered Content: For websites that heavily rely on JavaScript to render their content (e.g., Single Page Applications), the
--mode dynamicoption, which uses a full browser instance, is necessary to fetch and parse the complete page. - Structured Data Extraction: When you need to extract specific data points like product prices, article text, or lists of items from a webpage, Scrapling's CSS selector or XPath extraction capabilities are highly effective. The
--selectorargument can be used to pinpoint the exact elements to extract. - API Fallback: If
web_fetchfails due to complex site structures or anti-bot measures, Scrapling provides a robust alternative for retrieving web content.
Example Prompts
- "Scrape the headlines from the homepage of example.com using CSS selectors and provide the output in JSON format."
- "Fetch the main article text from this URL [URL] using stealth mode to bypass Cloudflare, and output it as plain text."
- "Extract all product prices from the category page at [URL], which is heavily JavaScript-rendered. Use the dynamic mode and output the results as a list."
Tips & Limitations
- Permissions are Crucial: Always ensure you have explicit permission to scrape a website. Respect
robots.txtand the website's Terms of Service. Unauthorized scraping can lead to legal issues and IP bans. - Stealth Mode Usage: Use
stealthmode judiciously. It is resource-intensive and should only be employed when necessary to bypass anti-bot measures on authorized sites, not for circumventing paywalls or accessing restricted content. - Installation Confirmation: Be aware that the installation requires downloading significant data and browser binaries. Always confirm with the user before executing the installation commands.
- Adaptive Scraping: For recurring scraping tasks, leverage Scrapling's adaptive scraping features (
auto_save=Trueoradaptive) to persist element fingerprints locally, improving efficiency and resilience to minor site changes. - MCP Server: The
scrapling mcpcommand starts a local network service. Only use this if explicitly needed and you trust the environment, as it exposes a local HTTP server. - Resource Intensive:
stealthanddynamicmodes are more resource-intensive thanhttpmode due to the headless browser execution. - Large-Scale Crawls: For very large-scale scraping operations, consider using Scrapling's Spider API, as detailed in its official documentation.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-damirikys-scrapling-fetcher": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, code-execution
Related Skills
demo-slap
Generate CS2 highlights and fragmovies from demos using the Demo-Slap API, with optional Leetify integration and Demo-Slap match history fallback to select recent matches. Use when a user asks to record a highlight, render a clip, make a fragmovie, clip a round, or turn a CS2 demo into MP4 video.
leetify
Get CS2 player statistics, match analysis, and gameplay insights from Leetify API. Supports player comparison and season stats. Use for stat queries and demo analysis.
markitdown
MarkItDown is a Python utility from Microsoft for converting various files (PDF, Word, Excel, PPTX, Images, Audio) to Markdown. Useful for extracting structured text for LLM analysis.
faster-whisper
Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models. Use when asked to "transcribe audio", "whisper", or "speech to text".