Back to Registry View Author Profile
Official Verified
ecommerce-scraper
爬取动态电商网站数据。使用Playwright处理JavaScript渲染的页面,支持Cloudflare反爬、隐躲API发现、分页抓取。适用于: (1) 爬取京东/淘宝/拼多多等中国电商, (2) 爬取Amazon/eBay等国际电商, (3) 价格监控和竞品分析, (4) 批量商品数据采集。
skill-install — Terminal
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chefroger/ecommerce-scraperOr
E-commerce Scraper
电商动态网站爬虫技能,基于Playwright处理JavaScript渲染。
快速开始
基础爬取
from playwright.sync_api import sync_playwright
def scrape_page(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
content = page.content()
browser.close()
return content
完整示例:爬取商品列表
from playwright.sync_api import sync_playwright
import json
import re
def scrape_ecommerce_products(url, max_pages=3):
"""爬取电商商品数据"""
products = []
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=['--disable-blink-features=AutomationControlled']
)
context = browser.new_context(
user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
)
page = context.new_page()
# 绕过Cloudflare检测
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
for page_num in range(1, max_pages + 1):
print(f"爬取第 {page_num} 页...")
page.goto(f"{url}?page={page_num}", wait_until="networkidle", timeout=30000)
# 等待商品加载
try:
page.wait_for_selector('.product-item, .goods-item, [class*="product"]', timeout=10000)
except:
pass
# 提取商品数据
items = page.query_selector_all('div[class*="product"], li[class*="item"], .goods-item')
for item in items:
try:
product = {
'title': item.query_selector('a[class*="title"], h3, .product-title')?.inner_text().strip(),
'price': item.query_selector('[class*="price"], .sale-price, .real-price')?.inner_text().strip(),
'link': item.query_selector('a')?.get_attribute('href'),
'image': item.query_selector('img')?.get_attribute('src'),
}
if product['title']:
products.append(product)
except Exception as e:
print(f"提取错误: {e}")
# 检查是否有下一页
next_btn = page.query_selector('button:has-text("下一页"), a:has-text("下一页")')
if not next_btn:
break
browser.close()
return products
核心技巧
1. 发现隐藏API (最重要!)
不要直接爬页面,先找API:
Metadata
AI Skill Finder
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skill Add to Configuration
Paste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chefroger-ecommerce-scraper": {
"enabled": true,
"auto_update": true
}
}
}Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.