What This Skill Does

The playwright-scraper-skill is a robust, Playwright-powered solution for OpenClaw designed to handle web data extraction challenges. Unlike standard fetch tools, this skill provides a tiered approach to scraping by offering both a lightweight, speed-focused script for standard dynamic pages and a sophisticated stealth-enabled module for websites protected by advanced anti-bot solutions like Cloudflare. It is specifically optimized to navigate common hurdles such as JavaScript-rendered content, headless browser detection, and user-agent fingerprinting.

Installation

To get started, ensure you have the OpenClaw environment active. Navigate to the skill directory and install the necessary dependencies:

cd playwright-scraper-skill
npm install
npx playwright install chromium

Ensure your system meets the requirements for running headless browsers, as this skill relies on the Chromium engine to simulate a real user environment.

Use Cases

This skill is built for users requiring high-fidelity data extraction.

For simple dynamic content: Use playwright-simple.js when elements are rendered via React, Vue, or other client-side frameworks.
For restricted environments: Deploy playwright-stealth.js when you encounter 403 Forbidden errors or anti-scraping challenges. The stealth implementation mimics human behavior through randomized delays and realistic headers.
For niche platforms: Use specialized handlers like deep-scraper for YouTube or reddit-scraper for social platforms to ensure compliant and structured output.

Example Prompts

"OpenClaw, use the stealth scraper to fetch the latest hot threads from https://m.discuss.com.hk/ and summarize the top three topics."
"Can you please scrape the pricing table from the dynamic dashboard at https://example.com/pricing using the simple playwright script?"
"The previous fetch attempt returned an empty page for this site; please try again using the playwright-stealth module to bypass the security wall."

Tips & Limitations

Efficiency First: Always attempt a basic web_fetch before resorting to Playwright; it is significantly faster and consumes fewer system resources.
Rate Limiting: Even with stealth enabled, excessive scraping can trigger IP-based rate limiting. Consider adding manual delays or proxy rotations if you are scraping at scale.
Maintenance: Web interfaces change frequently. If a scraper fails, check for DOM structure changes in the target website before debugging the automation scripts themselves. Always keep your browser binaries updated via npx playwright install.

playwright-scraper-skill

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)