Anycrawl
Skill by techlaai
Why use this skill?
Supercharge your OpenClaw agent with AnyCrawl. Scrape web content, crawl SPAs, and perform structured Google searches using powerful browser engines.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/techlaai/anycrawlWhat This Skill Does
AnyCrawl is a powerful, high-performance web crawling and scraping skill designed for the OpenClaw AI agent ecosystem. It allows your agent to bridge the gap between static web pages and actionable structured data. Whether you need to extract specific metadata, scrape content from dynamic Single Page Applications (SPAs) using headless browser engines, or perform complex Google searches, AnyCrawl provides the necessary infrastructure. With support for multiple engines like Cheerio, Playwright, and Puppeteer, it adapts to the complexity of any target website, ensuring your agent can access information effectively.
Installation
To install this skill, run the following command in your terminal:
clawhub install openclaw/skills/skills/techlaai/anycrawl
Post-installation, you must configure your API key. You can set it globally using an environment variable:
export ANYCRAWL_API_KEY="your-api-key"
Alternatively, you can configure it directly within the OpenClaw gateway using the openclaw config.patch command.
Use Cases
- Market Intelligence: Monitor competitor pricing by scraping e-commerce product pages and extracting structured JSON data.
- Content Curation: Automatically gather research materials, summarize articles, or build datasets by crawling multiple search result pages.
- Dynamic Web Interaction: Use Playwright/Puppeteer engines to extract data from JavaScript-heavy websites that standard crawlers cannot interpret.
- Search Augmentation: Enhance your agent's real-time knowledge base by enabling advanced Google search capabilities.
Example Prompts
- "AnyCrawl the latest tech news from https://techcrunch.com and provide a summary of the top 3 articles in markdown format."
- "Search for 'latest advancements in quantum computing' on Google and scrape the first 5 results to extract key research findings into a JSON format."
- "Go to this product URL and extract the product name, current price, and availability status into a structured object using the cheerio engine."
Tips & Limitations
- Engine Selection: Use the default
cheerioengine for speed and efficiency on simple HTML pages. Switch toplaywrightorpuppeteeronly when dealing with dynamic content that requires JS execution. - Timeout Management: Always set appropriate timeouts for complex pages. Browsers can be heavy; don't set a
wait_for_selectorperiod that is unnecessarily long, as it will increase your latency and compute costs. - Rate Limiting: Be mindful of the target website's
robots.txtand terms of service. High-frequency scraping may lead to IP blocking or CAPTCHA triggers.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-techlaai-anycrawl": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api, data-collection