kekik-crawler
Scrapling-only, deterministic web crawler with clean SRP architecture, presets, checkpointing, and JSONL/report outputs.
Why use this skill?
Discover Kekik-crawler, a high-performance, headless web scraping skill for OpenClaw. Efficiently extract data with presets, JSONL output, and checkpointing support.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/keyiflerolsun/kekik-crawlerWhat This Skill Does
Kekik-crawler is a high-performance, deterministic web crawler built for the OpenClaw ecosystem, specifically optimized for speed and structured data extraction. Unlike resource-heavy browser-based crawlers, this tool leverages the 'Scrapling' library to achieve rapid, headless data gathering without the overhead of rendering JavaScript. The architecture follows a strict Single Responsibility Principle (SRP), ensuring that crawling, parsing, and data serialization remain decoupled and maintainable. It features robust checkpointing capabilities to handle large-scale crawls and provides structured output via JSONL and comprehensive JSON reporting, making it an ideal choice for data scientists, OSINT investigators, and developers needing reliable web scraping pipelines.
Installation
To integrate this skill into your OpenClaw environment, use the internal skill manager:
clawhub install openclaw/skills/skills/keyiflerolsun/kekik-crawler
Ensure you have the necessary dependencies installed by running pip install -r requirements.txt within the skill directory. Once installed, the main entry point is main.py, which is orchestrated by the core/crawl_runner.py logic.
Use Cases
- OSINT & Person Research: Utilize the
person-researchpreset to aggregate information across various domains for specific identities or aliases. - Deep Research & Aggregation: Employ the
deep-researchpreset to perform recursive crawls, ideal for building training datasets or comprehensive knowledge bases. - Automated Data Pipelines: Integrate into larger workflows where you need to extract structured data into JSONL formats for ingestion into vector databases or LLM fine-tuning pipelines.
Example Prompts
- "Use kekik-crawler with the person-research preset to find all mentions of 'John Doe' and save the output to my local outputs folder."
- "Perform a deep-research crawl on the tech news aggregate sites to gather content for my weekly analysis report."
- "Run the crawler on the provided URL list and ensure a full JSON report is generated for tracking purposes."
Tips & Limitations
- Efficiency: Since this tool does not render JavaScript, it is incredibly fast, but it will not capture data from sites that rely solely on dynamic client-side rendering.
- Storage: Always check the
outputs/directory for your JSONL data and summary reports. Periodically clear old files to maintain disk space. - Determinism: The tool is designed to be deterministic; if a crawl fails, simply restart the process using the existing checkpoint configuration to resume where it left off.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-keyiflerolsun-kekik-crawler": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read