smart-web-scraper
Extract structured data from any web page. Supports CSS selectors, auto-detection of tables and lists, JSON/CSV output formats. Use when asked to scrape a website, extract data from a page, pull product info, gather contact details, or collect listings from a URL.
Why use this skill?
Effortlessly scrape and extract structured data from any web page with the Smart Web Scraper. Supports CSS selectors, auto-detection of tables/lists, and multiple output formats (JSON, CSV).
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/mariusfit/smart-web-scraperWhat This Skill Does
The Smart Web Scraper is a powerful tool designed to extract structured data from any given web page. It offers flexibility by supporting CSS selectors for precise data targeting, and it can automatically detect and parse HTML tables and lists. The extracted data can be conveniently output in various formats, including JSON, CSV, plain text, and Markdown, with options to save directly to a file. This skill is ideal for collecting product information, contact details, listings, or any other data presented on a webpage. It also includes advanced features like multi-page crawling to follow pagination and extract data across multiple pages.
Installation
To install the Smart Web Scraper, use the following command:
clawhub install openclaw/skills/skills/mariusfit/smart-web-scraper
This command will download and set up the necessary components for the skill to function.
Use Cases
- E-commerce Data Collection: Scrape product names, prices, descriptions, and availability from online stores.
- Lead Generation: Extract contact information (emails, phone numbers) from business directories or company websites.
- Market Research: Gather data from competitor websites, such as feature lists, pricing tiers, or customer reviews.
- Content Aggregation: Collect articles, blog posts, or news listings from various sources.
- Data Analysis: Pull structured data from reports or public datasets presented in tables on web pages.
- Real Estate Listings: Extract property details, prices, and agent information from real estate portals.
Example Prompts
- "Scrape all product details from
https://shop.example.com/gadgetsusing the CSS selector.product-itemand save the output as JSON togadgets.json." - "Extract all the pricing tables from
https://example.com/servicesand format the output as CSV." - "Crawl the news website starting from
https://news.example.com/page/1for the next 10 pages, extracting article titles using the selectorh2.article-title, and output as JSON."
Tips & Limitations
- Specificity is Key: When using CSS selectors, be as specific as possible to ensure you extract only the desired data and avoid noise.
- Check Website Structure: Web page structures can change. If the scraper stops working, the website's HTML might have been updated, requiring a review of your selectors.
- Respect
robots.txt: While not explicitly enforced by the tool, always be mindful of a website'srobots.txtfile and terms of service to avoid overloading servers or violating usage policies. - Dynamic Content: This scraper primarily works with server-rendered HTML. Content loaded dynamically via JavaScript after the initial page load might not be fully captured without additional tools or configurations.
- Rate Limiting: Be cautious when scraping large amounts of data or crawling many pages from a single website. Implement delays or use the tool responsibly to avoid being blocked.
- Error Handling: Network issues or unexpected HTML structures can lead to errors. Consider adding error handling in your workflow if using this skill in an automated pipeline.
- Dependencies: Ensure you have the necessary libraries (
beautifulsoup4,lxml) installed as indicated in the quick start guide (uv run --with beautifulsoup4 --with lxml).
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-mariusfit-smart-web-scraper": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, data-collection
Related Skills
content-repurposer
Repurpose any blog post or article into multiple social media formats. Input a URL or text, get X/Twitter thread, LinkedIn post, Instagram caption, email snippet, and summary. Use when asked to repurpose content, create social posts from an article, turn a blog into tweets, or generate multi-platform content.
whatsapp-faq-bot
Build and query a FAQ knowledge base from markdown files. Use when asked to create a FAQ bot, set up automatic answers, build a knowledge base, add FAQ entries, search FAQs, or answer common questions from a knowledge base. Perfect for WhatsApp business bots.
daily-business-report
Generate daily business briefings from multiple data sources. Aggregates weather, crypto prices, news headlines, system health, and calendar events into a formatted morning report. Use when asked to create a daily report, morning briefing, business summary, or status dashboard.
security-hardener
Audit and harden OpenClaw configuration for security. Scans openclaw.json for vulnerabilities, exposed credentials, insecure gateway settings, overly permissive exec rules, and missing security best practices. Use when asked to audit security, harden configuration, check for vulnerabilities, or secure an OpenClaw deployment.