firecrawl
Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrape multiple URLs, or build knowledge bases from web content. Supports single page extraction, site-wide crawling, batch processing, and structured data extraction with CSS selectors.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/antonia-sz/web-scraper-firecrawlWhat This Skill Does
The firecrawl skill provides a powerful interface for web scraping and content extraction, specifically optimized for LLM consumption. It acts as a bridge between raw, messy web content and structured, clean data. By leveraging the Firecrawl API, the skill handles complex JavaScript rendering, site-wide navigation, and granular content extraction. It enables users to fetch single pages, perform recursive site crawls, map URL structures, and extract structured data using CSS selectors. The output is typically delivered as clean Markdown, which is ideal for RAG (Retrieval-Augmented Generation) pipelines, content migration, or automated research tasks. It effectively bypasses the common headaches associated with manual scraping, such as dynamic content loading and HTML cleanup.
Installation
To integrate this skill into your environment, use the OpenClaw installer command: clawhub install openclaw/skills/skills/antonia-sz/web-scraper-firecrawl. After installation, ensure you have a valid Firecrawl API key. You must configure your environment by setting the FIRECRAWL_API_KEY environment variable. Ensure the requests library is installed in your Python environment, as it serves as the underlying transport layer for the API interactions.
Use Cases
- Knowledge Base Generation: Automatically convert technical documentation sites into unified Markdown files to serve as context for private AI agents.
- Competitive Intelligence: Batch-scrape competitor product pages and use structured extraction to pull pricing data into JSON format for comparison.
- Content Migration: Export entire websites for archival purposes or to migrate legacy CMS content into modern documentation systems.
Example Prompts
- "Firecrawl this documentation site at https://docs.example.com and save all pages as markdown files in my project folder for RAG training."
- "Use firecrawl to map all URLs on https://blog.example.com and then extract the article titles and publish dates using CSS selectors."
- "Scrape these 5 URLs provided in urls.txt and give me the output in clean markdown format for my research report."
Tips & Limitations
- Rate Limiting: Always be mindful of the target website's
robots.txtand your Firecrawl plan's rate limits. Use the--limitflag for large crawls to avoid excessive consumption. - Dynamic Content: For sites heavily reliant on client-side rendering (SPA), use the
--wait-forflag to ensure the JS fully executes before extraction. - Data Privacy: Ensure you have authorization to crawl specific sites. The skill performs network requests and data collection; respect copyright and terms of service for the scraped content.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-antonia-sz-web-scraper-firecrawl": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api
Related Skills
content-automation
内容创作自动化工具 Skill。支持社交媒体内容生成、视频脚本创作、定时发布任务管理。当用户需要批量生成内容、自动化社交媒体运营或创建视频脚本时触发。
style-cloner
提供1-5篇参考文章 + 原始素材,AI 分析参考文章的风格特征, 将素材改写成同风格的成品文章,输出3个版本供选择,支持强度调节和迭代优化。
project-evaluator
描述一个项目想法,AI 从市场/技术/商业/风险四个维度系统评估, 输出评估报告、竞品速查、MVP建议,帮你决策「值不值得做」。
maybe-finance
Personal finance management skill using Maybe Finance OS. Use when users need to track expenses, analyze budgets, monitor net worth, or manage personal finances through the Maybe Finance self-hosted platform. Supports transaction tracking, account management, budget analysis, and financial reporting.
notebooklm
Google NotebookLM 非官方 Python API 的 OpenClaw Skill。支持内容生成(播客、视频、幻灯片、测验、思维导图等)、文档管理和研究自动化。当用户需要使用 NotebookLM 生成音频概述、视频、学习材料或管理知识库时触发。