playwright-scraper-skill
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
Why use this skill?
Master web scraping with the Playwright-based OpenClaw skill. Includes stealth modes for Cloudflare-protected sites and simple scripts for dynamic content.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/waisimon/playwright-scraper-skillWhat This Skill Does
The playwright-scraper-skill is a robust, Playwright-powered solution for OpenClaw designed to handle web data extraction challenges. Unlike standard fetch tools, this skill provides a tiered approach to scraping by offering both a lightweight, speed-focused script for standard dynamic pages and a sophisticated stealth-enabled module for websites protected by advanced anti-bot solutions like Cloudflare. It is specifically optimized to navigate common hurdles such as JavaScript-rendered content, headless browser detection, and user-agent fingerprinting.
Installation
To get started, ensure you have the OpenClaw environment active. Navigate to the skill directory and install the necessary dependencies:
cd playwright-scraper-skillnpm installnpx playwright install chromium
Ensure your system meets the requirements for running headless browsers, as this skill relies on the Chromium engine to simulate a real user environment.
Use Cases
This skill is built for users requiring high-fidelity data extraction.
- For simple dynamic content: Use
playwright-simple.jswhen elements are rendered via React, Vue, or other client-side frameworks. - For restricted environments: Deploy
playwright-stealth.jswhen you encounter 403 Forbidden errors or anti-scraping challenges. The stealth implementation mimics human behavior through randomized delays and realistic headers. - For niche platforms: Use specialized handlers like
deep-scraperfor YouTube orreddit-scraperfor social platforms to ensure compliant and structured output.
Example Prompts
- "OpenClaw, use the stealth scraper to fetch the latest hot threads from https://m.discuss.com.hk/ and summarize the top three topics."
- "Can you please scrape the pricing table from the dynamic dashboard at https://example.com/pricing using the simple playwright script?"
- "The previous fetch attempt returned an empty page for this site; please try again using the playwright-stealth module to bypass the security wall."
Tips & Limitations
- Efficiency First: Always attempt a basic
web_fetchbefore resorting to Playwright; it is significantly faster and consumes fewer system resources. - Rate Limiting: Even with stealth enabled, excessive scraping can trigger IP-based rate limiting. Consider adding manual delays or proxy rotations if you are scraping at scale.
- Maintenance: Web interfaces change frequently. If a scraper fails, check for DOM structure changes in the target website before debugging the automation scripts themselves. Always keep your browser binaries updated via
npx playwright install.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-waisimon-playwright-scraper-skill": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, code-execution