web-reader-pro
Advanced web content extraction skill for OpenClaw using multi-tier fallback strategy (Jina → Scrapling → WebFetch) with intelligent routing, caching, quality scoring, and domain learning. Use when: reading article content, extracting web page text, scraping dynamic JS-heavy pages, or fetching WeChat official account articles.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/0xcjl/web-reader-proWeb Reader Pro - OpenClaw Skill
Overview
Web Reader Pro is an advanced web content extraction skill for OpenClaw that uses a multi-tier fallback strategy with intelligent routing, caching, and quality assessment.
Features
1. Three-Tier Fallback Strategy
- Tier 1: Jina Reader API - Fast, reliable, best for most websites
- Tier 2: Scrapling + Playwright - Dynamic content rendering for JS-heavy sites
- Tier 3: WebFetch Fallback - Basic extraction for simple pages
2. Jina Quota Monitoring
- Tracks API call count with persistent counter
- Warning alerts when approaching quota limits
- Automatic fallback to lower-tier methods when quota exhausted
3. Smart Cache Layer
- Short-term caching (configurable TTL, default 1 hour)
- Cache key based on URL hash
- Reduces redundant API calls
4. Extraction Quality Scoring
- Scores based on: word count, title detection, content density
- Minimum quality threshold (default: 200 words + valid title)
- Auto-escalation to next tier if quality below threshold
5. Domain-Level Routing Learning
- Learns optimal extraction tier per domain
- Persists learned routes in local JSON database
- Adapts based on historical success rates
6. Retry with Exponential Backoff
- Configurable max retries per tier (default: 3)
- Exponential backoff: 1s, 2s, 4s, 8s...
- Respects rate limits and transient failures
Installation
# Install dependencies
pip install -r requirements.txt
# Install Scrapling (requires Node.js)
./scripts/install_scrapling.sh
# Or install Scrapling manually
npm install -g @scrapinghub/scrapling
Usage
Basic Usage
from scripts.web_reader_pro import WebReaderPro
reader = WebReaderPro()
result = reader.fetch("https://example.com")
print(result['title'])
print(result['content'])
Advanced Configuration
reader = WebReaderPro(
jina_api_key="your-jina-key", # Optional: set via env JINA_API_KEY
cache_ttl=3600, # Cache TTL in seconds (default: 3600)
quality_threshold=200, # Min word count for quality (default: 200)
max_retries=3, # Max retries per tier (default: 3)
enable_learning=True, # Enable domain learning (default: True)
scrapling_path="/usr/local/bin/scrapling" # Path to scrapling binary
)
Result Format
{
"title": "Page Title",
"content": "Extracted content in markdown...",
"url": "https://example.com",
"tier_used": "jina|scrapling|webfetch",
"quality_score": 85,
"cached": False,
"domain_learned_tier": "jina",
"extracted_at": "2024-01-01T00:00:00Z"
}
Environment Variables
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-0xcjl-web-reader-pro": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
autoresearch-pro
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
cjl-slides
Create stunning HTML presentations in 24 international design styles with strict design rules. Export to .pptx for PowerPoint editing. ## Design Philosophy - Aesthetic-first: each style is a curated visual system, not just colors - Font whitelist enforcement: prevents AI-generic typography - Container ratio lock (16:9): ensures consistent rendering across devices - Zero external dependencies: pure HTML/CSS/JS, works offline ## Usage 1. Activate → Select style by name/number or browse 24 options 2. Provide content (topic, audience, key points) or upload .pptx for conversion 3. Review generated HTML slides → request modifications (color/font/layout) 4. Optionally export .pptx for manual editing in PowerPoint ## Precautions - Fonts are restricted to a whitelist; custom fonts require adding to the allowed list first - Chart.js CDN is used; if blocked, falls back to jsdelivr mirror - HTML files must retain their relative structure when shared - .pptx export preserves exact colors and fonts but layout uses pptx-native elements ## Credits Design rules adapted from "专精 HTML 演示文稿的顶级视觉设计师" (24 design styles reference). Base HTML structure and tooling inspired by zarazhangrui/frontend-slides.
Diagram Drawing
Skill by 0xcjl
anti-sycophancy
Three-layer sycophancy defense based on ArXiv 2602.23971. Use /anti-sycophancy install to deploy all layers, or manage individually via install-claude-code / install-openclaw / uninstall / status / verify. Layer 1: CC-only hook; Layer 2: SKILL (cross-platform); Layer 3: CLAUDE.md (CC) / SOUL.md (OC).
auto-diary
Automatically write daily/weekly/monthly diary summaries and extract insights to auto-learn.md for HexaLoop.