scraper
Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chrisling-dev/links-to-pdfsWhat This Skill Does
The scraper skill is a robust browser automation agent designed to convert virtually any web-accessible document into a high-quality PDF. Whether you are dealing with public Notion pages, protected DocSend links, or standard web pages, the scraper leverages a daemon-backed browser engine to render, capture, and archive content. By supporting session persistence through named profiles, the tool allows users to maintain authenticated states for gated content, ensuring seamless scraping across multiple sessions without repeated manual logins.
Installation
To integrate this skill into your environment, use the OpenClaw hub CLI:
clawhub install openclaw/skills/skills/chrisling-dev/links-to-pdfs
Ensure you have Node.js installed on your system to support the underlying docs-scraper dependency. Once installed, the daemon will auto-start upon your first scrape request, managing browser lifecycles to optimize performance.
Use Cases
- Archiving Research: Convert ephemeral Notion documentation into static PDFs for long-term storage.
- Sales & Client Deliverables: Easily scrape DocSend presentations or proposal documents into local archives.
- Web Content Preservation: Transform complex web articles or dynamic pages into printable documents, bypassing paywalls or login screens by using saved profiles.
- Compliance & Reporting: Generate offline audit trails of web-based documents by scraping them into local storage with consistent naming and organizational structure.
Example Prompts
- "Scrape the Notion project roadmap at [URL] and save it to my local archive."
- "Download the client proposal from this DocSend link [URL] using my 'client-portal' profile."
- "Convert this documentation page to a PDF and make sure it handles the login using my saved credentials."
Tips & Limitations
- Efficiency: Always use the daemon mode (default) to keep the browser engine warm; this significantly reduces latency for subsequent scraping tasks.
- Authentication: If a scrape fails, use
docs-scraper jobs listto check for pending authentication requests, then use theupdatecommand to pass credentials. - Storage: The tool defaults to
~/.docs-scraper/output/. Be aware that the automated cleanup service runs every hour to purge files, so move critical documents to a permanent directory if you need to retain them long-term. - Dynamic Content: While the tool handles many modern web frameworks, pages that require complex user interactions beyond standard login flows may require manual guidance via the CLI.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chrisling-dev-links-to-pdfs": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, data-collection, external-api