Official Verified developer tools Safety 3/5

crawl

Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL.

Why use this skill?

Efficiently crawl any website and save content as structured Markdown files. Ideal for documentation archiving and data analysis with zero code requirements.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/barneyjm/crawl

Download Source Code (.zip)

What This Skill Does

The Crawl skill is a robust web scraping and content extraction tool designed to help OpenClaw users convert live websites into structured, offline-ready Markdown files. By leveraging the Tavily Search API, this skill navigates complex URL structures, follows navigational links, and extracts readable content based on specific focus areas or broad site crawls. It serves as a bridge between the vast, chaotic web and the structured environment of your local workspace.

Installation

To install the Crawl skill, run the following command in your terminal: clawhub install openclaw/skills/skills/barneyjm/crawl. After installation, you must configure your Tavily API key to authorize the skill. Add your credentials to your configuration file at ~/.claude/settings.json under the env object as TAVILY_API_KEY. Once configured, the skill is ready to be triggered via the ./scripts/crawl.sh interface or directly through your agent's command interface.

Use Cases

Documentation Archiving: Download entire documentation suites for offline reference or to provide a static knowledge base for local LLM fine-tuning or RAG implementations.
Market Analysis: Extract content from industry portals, blogs, or competitors' websites to perform automated sentiment or trend analysis.
Knowledge Management: Aggregate scattered web resources into a unified, clean Markdown repository that can be easily searched or indexed by your local files.
Developer Workflows: Automatically fetch API references and code samples from online documentation to speed up integration processes.

Example Prompts

"Crawl the documentation at https://docs.openclaw.com with a max depth of 2, and save the content into the ./docs folder so I can reference it while offline."
"Can you perform a focused crawl of https://api-guide.example.com? I specifically need to extract the section on authentication and error handling into a single report."
"Go to https://tech-updates.com and gather all recent blog posts from the last month that mention AI agent architectures, and save them as separate markdown files."

Tips & Limitations

When using the Crawl skill, start with a lower max_depth to avoid excessive data usage and unnecessary API costs. Use select_paths and exclude_paths (which support regex) to focus the crawler on relevant documentation pages and avoid noise like footers, social links, or administrative login pages. Note that the skill relies on the Tavily API, so ensure your network permits these requests. Always be mindful of website robots.txt policies when performing deep crawls. For better results, provide clear semantic instructions to allow the agent to filter out irrelevant chunks, which is particularly effective when working with large, complex domains.

Read Full Documentation on GitHub

Metadata

Author@barneyjm

Stars1100

Updated2026-02-17

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-barneyjm-crawl": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#web-scraping#automation#data-extraction#markdown#knowledge-management

Safety Score: 3/5

Flags: network-access, file-write, external-api

Related Skills

query

Search for places using natural language with Camino AI's location intelligence API. Returns relevant results with coordinates, distances, and metadata. Use when you need to find real-world locations like restaurants, shops, landmarks, or any point of interest.

barneyjm 1100

search

Search the web using Tavily's LLM-optimized search API. Returns relevant results with content snippets, scores, and metadata. Use when you need to find web content on any topic without writing code.

barneyjm 1100

journey

Plan multi-waypoint journeys with route optimization, feasibility analysis, and time budget constraints. Use when you need to plan trips with multiple stops or check if an itinerary is achievable.

barneyjm 1100

travel-planner

Plan complete day trips, walking tours, and multi-stop itineraries with time budgets using Camino AI's journey planning and route optimization.

barneyjm 1100

real-estate

Evaluate any address for home buyers and renters. Get nearby schools, transit, grocery stores, parks, restaurants, and walkability using Camino AI's location intelligence.

barneyjm 1100