Axelhu Playwright Scrape
Skill by axelhu
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/axelhu/axelhu-playwright-scrapePlaywright Scrape (axelhu-playwright-scrape)
抓取动态网页(JS 渲染内容)的 Skill。基于 Playwright + 系统 Chrome,支持三种模式。
环境要求
- Node.js (
playwright包) - Google Chrome 已安装于
/usr/bin/google-chrome - DISPLAY 环境变量(用于 GUI 模式)
安装方式:
cd /home/axelhu/.openclaw/workspace
npm install playwright
启动 Chrome 调试实例(关键!)
首次设置(只需一次)
创建 Chrome wrapper,让所有 google-chrome 命令默认开启调试端口:
mkdir -p ~/bin
cat > ~/bin/google-chrome << 'EOF'
#!/bin/bash
exec /usr/bin/google-chrome --remote-debugging-port=9222 "$@"
EOF
chmod +x ~/bin/google-chrome
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc
export PATH="$HOME/bin:$PATH"
启动 Chrome 调试实例
# 必须加 DISPLAY=:0,否则 exec 会话中 Chrome 无法找到显示器
DISPLAY=:0 google-chrome \
--remote-debugging-port=9222 \
--user-data-dir=$HOME/.config/google-chrome/Default \
--new-window \
--no-sandbox \
> /tmp/chrome-debug.log 2>&1 &
# 验证启动成功
sleep 3 && curl -s http://localhost:9222/json/version | head -c 50
注意:
--user-data-dir=$HOME/.config/google-chrome/Default使用你的默认 Chrome profile,登录状态会被复用。如果不想影响日常 Chrome,另用独立目录。
快捷启动脚本
bash /home/axelhu/.openclaw/skills/axelhu-playwright-scrape/scripts/start-chrome-debug.sh
使用方式
基本命令
node skills/axelhu-playwright-scrape/scripts/playwright-scrape.js <URL> [mode]
mode:gui(默认)、headless、stealth
快速调用模板
连接已启动的 Chrome(gui 模式):
const { chromium } = require('/home/axelhu/.openclaw/workspace/node_modules/playwright');
const browser = await chromium.connectOverCDP('http://localhost:9222');
const page = (await browser.contexts())[0].pages()[0]; // 复用已有标签页
新建标签页(新 context):
const ctx = await browser.newContext();
const page = await ctx.newPage();
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
await page.waitForTimeout(4000); // 等待 JS 渲染
三种模式
| 模式 | 适用场景 | 特点 |
|---|---|---|
gui | 有反爬的网站(知乎/小红书/B站等) | 复用用户 Chrome,指纹真实,可绕过检测 |
headless | 普通动态网站 | 后台运行,不需要显示器 |
stealth | 中等反爬目标 | 反爬启动参数,适合不需要真实浏览器的场景 |
登录态网站最佳实践
原理
gui 模式复用了用户本地的 Chrome profile,cookie/登录态直接沿用,无需额外认证。
需要登录的网站(实测可用)
| 平台 | 登录后能抓 |
|---|---|
| 小红书 | 帖子全文、搜索结果、推荐流 |
| 知乎 | 话题页、热榜、回答全文 |
| B站 | 排行榜、视频信息、关注列表 |
| 豆瓣 | 小组讨论、精选内容 |
B站 API 直接用法(无需解析页面)
// 获取当前登录用户的 SESSDATA cookie
const cookies = await ctx.cookies(['https://bilibili.com']);
const sessdata = cookies.find(c => c.name === 'SESSDATA')?.value;
// 调用 B站 API
const resp = await fetch('https://api.bilibili.com/x/relation/followings?pn=1&ps=20&vmid=UID', {
headers: { 'Cookie': 'SESSDATA=' + sessdata }
});
const data = await resp.json();
输出格式
{
"url": "https://...",
"mode": "gui|headless|stealth",
"title": "页面标题",
"content": "正文内容(前15000字)",
"images": ["图片URL列表"],
"links": [{"text": "链接文字", "href": "链接地址"}],
"loadTime": "1.23s"
}
常见问题
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-axelhu-axelhu-playwright-scrape": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
Openclaw Sleep
Skill by axelhu
superpowers-overview
Use when starting any development work or when unsure which superpowers development skill to use - provides entry point and navigation to the full superpowers skill suite for OpenClaw agents
superpowers-subagent-dev
Use when executing implementation plans with independent tasks - coordinates task execution by dispatching subagents per task with verification checkpoints, adapted for OpenClaw's isolated session model
contacts
通讯录查询与维护技能。用于查找联系人信息(open_id、chat_id、account_id 等)、记录新联系人、或查询历史沟通偏好。触发时机:(1) 需要 @某人或向某渠道发消息时 (2) 认识新联系人后需要录入通讯录时 (3) 查询某人的联系方式或交流偏好时 (4) 询问"谁知道xxx的飞书ID"或"怎么联系xxx"时。
superpowers-executing-plans
Use when executing a written implementation plan in the current session with sequential task execution and review checkpoints - for when subagent-driven mode is not available