Scrapling Web Scraping — MCP-Native Guidance

Guidance Layer + MCP Integration
Use this skill for strategy and patterns. For execution, call Scrapling's MCP server via mcporter.

Quick Start (MCP)

1. Install Scrapling with MCP support

pip install scrapling[mcp]
# Or for full features:
pip install scrapling[mcp,playwright]
python -m playwright install chromium

2. Add to OpenClaw MCP config

{
  "mcpServers": {
    "scrapling": {
      "command": "python",
      "args": ["-m", "scrapling.mcp"]
    }
  }
}

3. Call via mcporter

mcporter call scrapling fetch_page --url "https://example.com"

Execution vs Guidance

Task	Tool	Example
Fetch a page	mcporter	`mcporter call scrapling fetch_page --url URL`
Extract with CSS	mcporter	`mcporter call scrapling css_select --selector ".title::text"`
Which fetcher to use?	This skill	See "Fetcher Selection Guide" below
Anti-bot strategy?	This skill	See "Anti-Bot Escalation Ladder"
Complex crawl patterns?	This skill	See "Spider Recipes"

Fetcher Selection Guide

┌─────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│   Fetcher       │────▶│ DynamicFetcher   │────▶│ StealthyFetcher  │
│   (HTTP)        │     │ (Browser/JS)     │     │ (Anti-bot)       │
└─────────────────┘     └──────────────────┘     └──────────────────┘
     Fastest              JS-rendered               Cloudflare, 
     Static pages         SPAs, React/Vue          Turnstile, etc.

Decision Tree

Static HTML? → Fetcher (10-100x faster)
Need JS execution? → DynamicFetcher
Getting blocked? → StealthyFetcher
Complex session? → Use Session variants

MCP Fetch Modes

fetch_page — HTTP fetcher
fetch_dynamic — Browser-based with Playwright
fetch_stealthy — Anti-bot bypass mode

Anti-Bot Escalation Ladder

Level 1: Polite HTTP

# MCP call: fetch_page with options
{
  "url": "https://example.com",
  "headers": {"User-Agent": "..."},
  "delay": 2.0
}

Level 2: Session Persistence

# Use sessions for cookie/state across requests
FetcherSession(impersonate="chrome")  # TLS fingerprint spoofing

Level 3: Stealth Mode

# MCP: fetch_stealthy
StealthyFetcher.fetch(
    url,
    headless=True,
    solve_cloudflare=True,  # Auto-solve Turnstile
    network_idle=True
)

Level 4: Proxy Rotation

See references/proxy-rotation.md

Adaptive Scraping (Anti-Fragile)

Scrapling can survive website redesigns using adaptive selectors:

# First run — save fingerprints
products = page.css('.product', auto_save=True)

# Later runs — auto-relocate if DOM changed
products = page.css('.product', adaptive=True)

MCP usage:

mcporter call scrapling css_select \\
  --selector ".product" \\
  --adaptive true \\
  --auto-save true

scrapling

Install via CLI (Recommended)

Scrapling Web Scraping — MCP-Native Guidance

Quick Start (MCP)

1. Install Scrapling with MCP support

2. Add to OpenClaw MCP config

3. Call via mcporter

Execution vs Guidance

Fetcher Selection Guide

Decision Tree

MCP Fetch Modes

Anti-Bot Escalation Ladder

Level 1: Polite HTTP

Level 2: Session Persistence

Level 3: Stealth Mode

Level 4: Proxy Rotation

Adaptive Scraping (Anti-Fragile)

Metadata