What This Skill Does

The liteparse skill is a powerful, local-first document processing agent for OpenClaw. It provides a robust bridge between raw document files and your LLM workflows, allowing the agent to "read" and "see" documents without sending data to external APIs. Built on industry-standard tools like PDF.js and Tesseract.js, it translates complex formats such as PDFs, Word documents, and spreadsheets into clean text or structured JSON. Because it operates entirely offline, it is an ideal solution for handling sensitive, proprietary, or private documents that cannot be uploaded to cloud-based parsing services.

Installation

To integrate this skill into your environment, use the OpenClaw management CLI:

clawhub install openclaw/skills/skills/alfred-intel-handler-source/liteparse

Ensure you have Node.js installed on your system. The skill relies on the global lit binary, which can be installed via npm install -g @llamaindex/liteparse. For advanced file support (like DOCX or images), ensure you have LibreOffice and ImageMagick installed and available in your system PATH, as liteparse bridges these tools for seamless conversion tasks.

Use Cases

Information Extraction: Parsing long-form PDFs, research papers, or manuals to extract specific data points for summarization.
Vision Workflows: Generating high-fidelity screenshots of PDF pages to feed into vision-capable LLMs for layout analysis or diagram interpretation.
Batch Processing: Converting entire folders of invoices, contracts, or records into a machine-readable text format for local RAG (Retrieval-Augmented Generation) indexing.
Data Migration: Converting legacy Excel or Word files into structured JSON objects for integration with local databases or agent workflows.

Example Prompts

"Extract all the text from the document 'Q3_Financials.pdf' and summarize the key revenue milestones for me."
"Take screenshots of pages 1 through 5 of the 'Project_Blueprint.pdf' and analyze the floor plan layout for me."
"Parse every document in the ./imports folder and save the output as clean text files in ./results."

Tips & Limitations

OCR Performance: By default, OCR is active for scanned documents. If your PDF has an embedded text layer, use the --no-ocr flag to significantly speed up processing and reduce resource usage.
Layout Complexity: While excellent for standard documents, very complex multi-column layouts or handwritten government forms may occasionally suffer from misaligned text. For mission-critical tasks involving highly complex layouts, consider transitioning to a cloud-based parser.
Resource Management: Large batch jobs benefit from local SSD speed. Keep the DPI for screenshots at 300 for a balance between clarity and file size.
Data Privacy: Because the skill runs entirely offline, you never have to worry about data leaking to external API providers, making it perfect for secure environments.

liteparse

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)