What This Skill Does

Kekik-crawler is a high-performance, deterministic web crawler built for the OpenClaw ecosystem, specifically optimized for speed and structured data extraction. Unlike resource-heavy browser-based crawlers, this tool leverages the 'Scrapling' library to achieve rapid, headless data gathering without the overhead of rendering JavaScript. The architecture follows a strict Single Responsibility Principle (SRP), ensuring that crawling, parsing, and data serialization remain decoupled and maintainable. It features robust checkpointing capabilities to handle large-scale crawls and provides structured output via JSONL and comprehensive JSON reporting, making it an ideal choice for data scientists, OSINT investigators, and developers needing reliable web scraping pipelines.

Installation

To integrate this skill into your OpenClaw environment, use the internal skill manager:

clawhub install openclaw/skills/skills/keyiflerolsun/kekik-crawler

Ensure you have the necessary dependencies installed by running pip install -r requirements.txt within the skill directory. Once installed, the main entry point is main.py, which is orchestrated by the core/crawl_runner.py logic.

Use Cases

OSINT & Person Research: Utilize the person-research preset to aggregate information across various domains for specific identities or aliases.
Deep Research & Aggregation: Employ the deep-research preset to perform recursive crawls, ideal for building training datasets or comprehensive knowledge bases.
Automated Data Pipelines: Integrate into larger workflows where you need to extract structured data into JSONL formats for ingestion into vector databases or LLM fine-tuning pipelines.

Example Prompts

"Use kekik-crawler with the person-research preset to find all mentions of 'John Doe' and save the output to my local outputs folder."
"Perform a deep-research crawl on the tech news aggregate sites to gather content for my weekly analysis report."
"Run the crawler on the provided URL list and ensure a full JSON report is generated for tracking purposes."

Tips & Limitations

Efficiency: Since this tool does not render JavaScript, it is incredibly fast, but it will not capture data from sites that rely solely on dynamic client-side rendering.
Storage: Always check the outputs/ directory for your JSONL data and summary reports. Periodically clear old files to maintain disk space.
Determinism: The tool is designed to be deterministic; if a crawl fails, simply restart the process using the existing checkpoint configuration to resume where it left off.

kekik-crawler

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)