Official Verified

data-scraper

Web page data collection and structured text extraction

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/mupengi-bot/data-scraper

Download Source Code (.zip)

data-scraper

Web Data Scraper — Extract structured data from web pages using curl + parsing. Lightweight, no browser required. Supports HTML-to-text, table extraction, price monitoring, and batch scraping.

When to Use

Extract text content from web pages (articles, blogs, docs)
Scrape product prices, reviews, or listings
Monitor pages for changes (price drops, new content)
Batch-collect data from multiple URLs
Convert HTML tables to structured formats (JSON/CSV)

Quick Start

# Extract readable text from URL
data-scraper fetch "https://example.com/article"

# Extract specific elements
data-scraper extract "https://example.com" --selector "h2, .price"

# Monitor for changes
data-scraper watch "https://example.com/product" --interval 3600

Extraction Modes

Text Mode (default)

Fetches page and extracts readable content, stripping HTML tags, scripts, and styles. Similar to reader mode.

data-scraper fetch URL
# Output: clean markdown text

Selector Mode

Target specific CSS selectors for precise extraction.

data-scraper extract URL --selector ".product-title, .price, .rating"
# Output: matched elements as structured data

Table Mode

Extract HTML tables into structured formats.

data-scraper table URL --index 0
# Output: JSON array of row objects (header → value mapping)

Link Mode

Extract all links from a page with optional filtering.

data-scraper links URL --filter "*.pdf"
# Output: filtered list of absolute URLs

Batch Scraping

# Scrape multiple URLs
data-scraper batch urls.txt --output results/

# With rate limiting
data-scraper batch urls.txt --delay 2000 --output results/

urls.txt format:

https://site1.com/page
https://site2.com/page
https://site3.com/page

Change Monitoring

# Watch for changes, alert on diff
data-scraper watch URL --selector ".price" --interval 3600

# Compare with previous snapshot
data-scraper diff URL

Stores snapshots in data-scraper/snapshots/ with timestamps. Alerts via notification-hub when changes detected.

Output Formats

Format	Flag	Use Case
Text	`--format text`	Reading, summarization
JSON	`--format json`	Data processing
CSV	`--format csv`	Spreadsheets
Markdown	`--format md`	Documentation

Headers & Auth

# Custom headers
data-scraper fetch URL --header "Authorization: Bearer TOKEN"

# Cookie-based auth
data-scraper fetch URL --cookie "session=abc123"

# User-Agent override
data-scraper fetch URL --ua "Mozilla/5.0..."

Rate Limiting & Ethics

Default: 1 request per second per domain
Respects robots.txt when --polite flag is set
Configurable delay between requests
Stops on 429 (Too Many Requests) and backs off

Error Handling

Read Full Documentation on GitHub

Metadata

Author@mupengi-bot

Stars1335

Updated2026-02-23

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-mupengi-bot-data-scraper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

prompt-engineer

Expert prompt engineer specializing in advanced prompting techniques, LLM optimization, and AI system design. Masters chain-of-thought, constitutional AI, and production prompt strategies. Use when building AI features, improving agent performance, or crafting system prompts.

mupengi-bot 1335

appointment-scheduler

Automated appointment management for beauty salons, clinics, studios, and photo booths. Handles booking requests, calendar sync, conflict detection, reminders, no-show tracking, and waitlist management.

mupengi-bot 1335

Mupeng Social Postcjo

Skill by mupengi-bot

mupengi-bot 1335

brand-voice

Manage brand tone/style for all writing skills

mupengi-bot 1335

auto-reply

Instagram DM auto-reply system. DM monitoring, reading, replying, security check (injection rejection). Use when checking Instagram DMs, reading unread messages, replying to DMs, setting up DM monitoring cron jobs, or handling DM auto-reply workflows. Triggers on: Instagram DM, DM check, DM reply, DM auto-reply, dm-alert.

mupengi-bot 1335