ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified browser automation Safety 3/5

Deep Scraper

Skill by opsun

Why use this skill?

Use Deep Scraper for reliable, containerized web data extraction. Bypass protections on YouTube and X to feed clean data to your LLMs.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/opsun/deep-scraper
Or

What This Skill Does

The Deep Scraper skill is a high-performance, containerized engineering solution designed to extract raw data from complex, highly-protected websites such as YouTube and X (Twitter). By utilizing a Docker-based environment paired with Crawlee and Playwright, the tool intercepts page rendering to deliver clean, structured data directly to your LLM processing pipeline. Unlike standard scrapers, Deep Scraper is built to bypass advanced anti-bot protections, ensuring reliable access to transcripts, descriptions, and metadata without the clutter of advertisements or UI noise.

Installation

To integrate this skill into your OpenClaw environment, ensure you have Docker installed and running on your host machine. First, run the installation command: clawhub install openclaw/skills/skills/opsun/deep-scraper. Once the files are located in your skills/ directory, navigate to the folder and build the container image using docker build -t clawd-crawlee skills/deep-scraper/. This prepares the isolated environment necessary for complex browser emulation.

Use Cases

  • Research & Analysis: Automatically pull video transcripts from educational or technical YouTube content for long-form analysis.
  • Social Media Monitoring: Collect public descriptions and text data from X threads to build sentiment or trend-watching databases.
  • Content Curation: Strip away web noise from articles to extract the pure text required for summarization tasks.

Example Prompts

  1. "Deep Scraper, can you get the full transcript for this YouTube URL so I can summarize the key points? [URL]"
  2. "Go to this Twitter thread and scrape the text of all the main posts; ignore the comments and ads."
  3. "Please extract the description and main text content from this complex news article and format it for my study notes."

Tips & Limitations

  • Privacy: The skill strictly adheres to public data policies and will not bypass password protections or private account settings.
  • Performance: Always ensure your Docker daemon is active before running tasks to avoid container initialization errors.
  • Validation: When scraping YouTube, the tool performs automatic ID verification; rely on this to ensure your data caches remain clean and accurate.
  • Updates: As web platforms frequently change their DOM structure, periodically rebuild your Docker image to ensure Crawlee and Playwright are up to date.

Metadata

Author@opsun
Stars1287
Views2
Updated2026-02-22
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-opsun-deep-scraper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#scraping#automation#crawlee#docker#web-data
Safety Score: 3/5

Flags: network-access, data-collection, code-execution