Official Verified

web-reader-pro

Advanced web content extraction skill for OpenClaw using multi-tier fallback strategy (Jina → Scrapling → WebFetch) with intelligent routing, caching, quality scoring, and domain learning. Use when: reading article content, extracting web page text, scraping dynamic JS-heavy pages, or fetching WeChat official account articles.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/0xcjl/web-reader-pro

Download Source Code (.zip)

Web Reader Pro - OpenClaw Skill

Overview

Web Reader Pro is an advanced web content extraction skill for OpenClaw that uses a multi-tier fallback strategy with intelligent routing, caching, and quality assessment.

Features

1. Three-Tier Fallback Strategy

Tier 1: Jina Reader API - Fast, reliable, best for most websites
Tier 2: Scrapling + Playwright - Dynamic content rendering for JS-heavy sites
Tier 3: WebFetch Fallback - Basic extraction for simple pages

2. Jina Quota Monitoring

Tracks API call count with persistent counter
Warning alerts when approaching quota limits
Automatic fallback to lower-tier methods when quota exhausted

3. Smart Cache Layer

Short-term caching (configurable TTL, default 1 hour)
Cache key based on URL hash
Reduces redundant API calls

4. Extraction Quality Scoring

Scores based on: word count, title detection, content density
Minimum quality threshold (default: 200 words + valid title)
Auto-escalation to next tier if quality below threshold

5. Domain-Level Routing Learning

Learns optimal extraction tier per domain
Persists learned routes in local JSON database
Adapts based on historical success rates

6. Retry with Exponential Backoff

Configurable max retries per tier (default: 3)
Exponential backoff: 1s, 2s, 4s, 8s...
Respects rate limits and transient failures

Installation

# Install dependencies
pip install -r requirements.txt

# Install Scrapling (requires Node.js)
./scripts/install_scrapling.sh

# Or install Scrapling manually
npm install -g @scrapinghub/scrapling

Usage

Basic Usage

from scripts.web_reader_pro import WebReaderPro

reader = WebReaderPro()
result = reader.fetch("https://example.com")
print(result['title'])
print(result['content'])

Advanced Configuration

reader = WebReaderPro(
    jina_api_key="your-jina-key",      # Optional: set via env JINA_API_KEY
    cache_ttl=3600,                      # Cache TTL in seconds (default: 3600)
    quality_threshold=200,               # Min word count for quality (default: 200)
    max_retries=3,                       # Max retries per tier (default: 3)
    enable_learning=True,                # Enable domain learning (default: True)
    scrapling_path="/usr/local/bin/scrapling"  # Path to scrapling binary
)

Result Format

{
    "title": "Page Title",
    "content": "Extracted content in markdown...",
    "url": "https://example.com",
    "tier_used": "jina|scrapling|webfetch",
    "quality_score": 85,
    "cached": False,
    "domain_learned_tier": "jina",
    "extracted_at": "2024-01-01T00:00:00Z"
}

Environment Variables

Read Full Documentation on GitHub

Metadata

Author@0xcjl

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-0xcjl-web-reader-pro": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

autoresearch-pro

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

0xcjl 4473

cjl-slides

Create stunning HTML presentations in 24 international design styles with strict design rules. Export to .pptx for PowerPoint editing. ## Design Philosophy - Aesthetic-first: each style is a curated visual system, not just colors - Font whitelist enforcement: prevents AI-generic typography - Container ratio lock (16:9): ensures consistent rendering across devices - Zero external dependencies: pure HTML/CSS/JS, works offline ## Usage 1. Activate → Select style by name/number or browse 24 options 2. Provide content (topic, audience, key points) or upload .pptx for conversion 3. Review generated HTML slides → request modifications (color/font/layout) 4. Optionally export .pptx for manual editing in PowerPoint ## Precautions - Fonts are restricted to a whitelist; custom fonts require adding to the allowed list first - Chart.js CDN is used; if blocked, falls back to jsdelivr mirror - HTML files must retain their relative structure when shared - .pptx export preserves exact colors and fonts but layout uses pptx-native elements ## Credits Design rules adapted from "专精 HTML 演示文稿的顶级视觉设计师" (24 design styles reference). Base HTML structure and tooling inspired by zarazhangrui/frontend-slides.

0xcjl 4473

Diagram Drawing

Skill by 0xcjl

0xcjl 4473

anti-sycophancy

Three-layer sycophancy defense based on ArXiv 2602.23971. Use /anti-sycophancy install to deploy all layers, or manage individually via install-claude-code / install-openclaw / uninstall / status / verify. Layer 1: CC-only hook; Layer 2: SKILL (cross-platform); Layer 3: CLAUDE.md (CC) / SOUL.md (OC).

0xcjl 4473

auto-diary

Automatically write daily/weekly/monthly diary summaries and extract insights to auto-learn.md for HexaLoop.

0xcjl 4473