Deep Scraper
Skill by opsun
Why use this skill?
Use Deep Scraper for reliable, containerized web data extraction. Bypass protections on YouTube and X to feed clean data to your LLMs.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/opsun/deep-scraperWhat This Skill Does
The Deep Scraper skill is a high-performance, containerized engineering solution designed to extract raw data from complex, highly-protected websites such as YouTube and X (Twitter). By utilizing a Docker-based environment paired with Crawlee and Playwright, the tool intercepts page rendering to deliver clean, structured data directly to your LLM processing pipeline. Unlike standard scrapers, Deep Scraper is built to bypass advanced anti-bot protections, ensuring reliable access to transcripts, descriptions, and metadata without the clutter of advertisements or UI noise.
Installation
To integrate this skill into your OpenClaw environment, ensure you have Docker installed and running on your host machine. First, run the installation command: clawhub install openclaw/skills/skills/opsun/deep-scraper. Once the files are located in your skills/ directory, navigate to the folder and build the container image using docker build -t clawd-crawlee skills/deep-scraper/. This prepares the isolated environment necessary for complex browser emulation.
Use Cases
- Research & Analysis: Automatically pull video transcripts from educational or technical YouTube content for long-form analysis.
- Social Media Monitoring: Collect public descriptions and text data from X threads to build sentiment or trend-watching databases.
- Content Curation: Strip away web noise from articles to extract the pure text required for summarization tasks.
Example Prompts
- "Deep Scraper, can you get the full transcript for this YouTube URL so I can summarize the key points? [URL]"
- "Go to this Twitter thread and scrape the text of all the main posts; ignore the comments and ads."
- "Please extract the description and main text content from this complex news article and format it for my study notes."
Tips & Limitations
- Privacy: The skill strictly adheres to public data policies and will not bypass password protections or private account settings.
- Performance: Always ensure your Docker daemon is active before running tasks to avoid container initialization errors.
- Validation: When scraping YouTube, the tool performs automatic ID verification; rely on this to ensure your data caches remain clean and accurate.
- Updates: As web platforms frequently change their DOM structure, periodically rebuild your Docker image to ensure Crawlee and Playwright are up to date.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-opsun-deep-scraper": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, data-collection, code-execution