ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 3/5

crawl

Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL.

Why use this skill?

Efficiently crawl any website and save content as structured Markdown files. Ideal for documentation archiving and data analysis with zero code requirements.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/barneyjm/crawl
Or

What This Skill Does

The Crawl skill is a robust web scraping and content extraction tool designed to help OpenClaw users convert live websites into structured, offline-ready Markdown files. By leveraging the Tavily Search API, this skill navigates complex URL structures, follows navigational links, and extracts readable content based on specific focus areas or broad site crawls. It serves as a bridge between the vast, chaotic web and the structured environment of your local workspace.

Installation

To install the Crawl skill, run the following command in your terminal: clawhub install openclaw/skills/skills/barneyjm/crawl. After installation, you must configure your Tavily API key to authorize the skill. Add your credentials to your configuration file at ~/.claude/settings.json under the env object as TAVILY_API_KEY. Once configured, the skill is ready to be triggered via the ./scripts/crawl.sh interface or directly through your agent's command interface.

Use Cases

  • Documentation Archiving: Download entire documentation suites for offline reference or to provide a static knowledge base for local LLM fine-tuning or RAG implementations.
  • Market Analysis: Extract content from industry portals, blogs, or competitors' websites to perform automated sentiment or trend analysis.
  • Knowledge Management: Aggregate scattered web resources into a unified, clean Markdown repository that can be easily searched or indexed by your local files.
  • Developer Workflows: Automatically fetch API references and code samples from online documentation to speed up integration processes.

Example Prompts

  1. "Crawl the documentation at https://docs.openclaw.com with a max depth of 2, and save the content into the ./docs folder so I can reference it while offline."
  2. "Can you perform a focused crawl of https://api-guide.example.com? I specifically need to extract the section on authentication and error handling into a single report."
  3. "Go to https://tech-updates.com and gather all recent blog posts from the last month that mention AI agent architectures, and save them as separate markdown files."

Tips & Limitations

When using the Crawl skill, start with a lower max_depth to avoid excessive data usage and unnecessary API costs. Use select_paths and exclude_paths (which support regex) to focus the crawler on relevant documentation pages and avoid noise like footers, social links, or administrative login pages. Note that the skill relies on the Tavily API, so ensure your network permits these requests. Always be mindful of website robots.txt policies when performing deep crawls. For better results, provide clear semantic instructions to allow the agent to filter out irrelevant chunks, which is particularly effective when working with large, complex domains.

Metadata

Author@barneyjm
Stars1100
Views0
Updated2026-02-17
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-barneyjm-crawl": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#web-scraping#automation#data-extraction#markdown#knowledge-management
Safety Score: 3/5

Flags: network-access, file-write, external-api