smart-fetch
Fetch web pages for LLM use with markdown-first negotiation, strict output limits, cache/revalidation, and robust HTML fallback. Use for article/doc/blog scraping where token efficiency, safer ingestion, and predictable extraction behavior are important.
Why use this skill?
Optimize LLM research with smart-fetch. A robust, markdown-first web scraping tool for OpenClaw featuring strict token limits and safety filtering.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/che7seachen/smart-fetchWhat This Skill Does
Smart-fetch is a specialized web ingestion tool for OpenClaw AI agents designed to bridge the gap between raw web data and LLM-ready information. Unlike standard fetchers, smart-fetch employs a sophisticated markdown-first negotiation strategy. It automatically attempts to request content in text/markdown format, and if unavailable, falls back to a high-quality HTML parsing pipeline using Readability and Turndown. This ensures that the text provided to your LLM is clean, structured, and free of noisy UI elements like navigation bars, advertisements, or scripts.
Beyond simple retrieval, smart-fetch implements rigorous output controls. Users can enforce strict character and byte limits to prevent token waste and cost overruns. The tool also performs intelligent metadata extraction, providing critical signals like severity ratings and safety flags (e.g., detecting potential command-injection lures or API key requests). This makes it an ideal intermediary for automated research workflows.
Installation
To integrate smart-fetch into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/che7seachen/smart-fetch
Ensure that your environment variables are configured correctly, particularly for domain allowlisting and cache directories, to match your infrastructure requirements.
Use Cases
Smart-fetch excels in scenarios requiring high-quality ingestion of technical documentation, blog posts, and academic papers. It is perfect for developers building autonomous research agents that need to scan multiple URLs for specific programming patterns or documentation updates. Because it handles the conversion to markdown at the source, your LLM consumes significantly fewer tokens to process large pages, leading to faster inference and better performance on complex reasoning tasks.
Example Prompts
- "Use smart-fetch to grab the documentation from https://docs.example-framework.com/api and summarize the authentication requirements for me."
- "Search for the latest release notes at https://blog.tech-corp.com/news, ensure the output is under 5000 characters, and extract only the breaking changes."
- "Fetch the content of https://research-paper.io/deep-learning and identify if there are any security risks mentioned in the implementation section."
Tips & Limitations
To maximize efficiency, utilize the cache-ttl flag for frequently accessed domains to minimize egress costs. When processing high-security environments, always leverage the SMART_FETCH_DOMAIN_BLOCKLIST to prevent accidental interaction with untrusted sources. Note that while the tool includes safety flags for 'lures,' it remains the user's responsibility to sanitize the content before passing it to any system-level processes. Always treat the output as untrusted text rather than executable instructions.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-che7seachen-smart-fetch": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-read, file-write