ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

corpusgraph

Document ETL, entity extraction, and relationship graphing engine. Convert 1,000+ file formats into searchable, structured data with automatic entity and relationship mapping.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/aingestigate/corpusgraph
Or

What This Skill Does

CorpusGraph is a sophisticated Document ETL and entity relationship engine designed for OpenClaw AI agents. It serves as a central hub for transforming raw, unstructured data—ranging from standard PDFs and emails to complex spreadsheets and image files—into a highly queryable, structured knowledge graph. By leveraging the power of the Ingestigate platform, CorpusGraph performs automatic entity extraction across 30+ distinct categories, including sensitive identifiers like addresses, crypto wallets, and professional contact information. It does not just parse text; it maps complex relationships, co-occurrences, and document-level insights, enabling your agent to synthesize information across thousands of records simultaneously.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/aingestigate/corpusgraph

Ensure that you have generated your INGESTIGATE_TOKEN via the provided portal and populated the INGESTIGATE_BASE_URL within your platform's secure configuration settings. The agent will automatically detect these variables, provided they are stored in the host system's environment or credential manager.

Use Cases

CorpusGraph is ideal for knowledge workers, data scientists, and investigative researchers. Use it to:

  • Analyze large legal discovery bundles or corporate document archives.
  • Map organizational hierarchies or communication networks from email exports.
  • Perform cross-document audit trails to verify financial figures or compliance requirements.
  • Quickly search and summarize technical documentation or research papers.
  • Extract structured datasets (e.g., entity lists) from heterogeneous document collections.

Example Prompts

  1. "Look through the current project corpus and tell me which organizations appear most frequently in our quarterly meeting notes."
  2. "Find any documents that mention 'Project X' and list the people and email addresses associated with that project."
  3. "Search all ingested files for mentions of suspicious financial transactions and provide a summary of the context for each find."

Tips & Limitations

  • Performance: For massive datasets (10,000+ files), the initial ETL processing time will vary; use the GET /api/discover/collections endpoint to monitor job status.
  • Granularity: Ensure your queries are specific. Because the graph maps deep relationships, vague queries may return broad result sets. Use the faceted search parameters to narrow down by entity type or file format.
  • Security: Never paste credentials directly into the chat interface; always rely on the system-level configuration. If you receive a 401 error, refresh your token immediately.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-aingestigate-corpusgraph": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#data-etl#entity-extraction#graph-database#document-analysis#knowledge-graph
Safety Score: 4/5

Flags: file-read, external-api