corpusgraph
Document ETL, entity extraction, and relationship graphing engine. Convert 1,000+ file formats into searchable, structured data with automatic entity and relationship mapping.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/aingestigate/corpusgraphWhat This Skill Does
CorpusGraph is a sophisticated Document ETL and entity relationship engine designed for OpenClaw AI agents. It serves as a central hub for transforming raw, unstructured data—ranging from standard PDFs and emails to complex spreadsheets and image files—into a highly queryable, structured knowledge graph. By leveraging the power of the Ingestigate platform, CorpusGraph performs automatic entity extraction across 30+ distinct categories, including sensitive identifiers like addresses, crypto wallets, and professional contact information. It does not just parse text; it maps complex relationships, co-occurrences, and document-level insights, enabling your agent to synthesize information across thousands of records simultaneously.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/aingestigate/corpusgraph
Ensure that you have generated your INGESTIGATE_TOKEN via the provided portal and populated the INGESTIGATE_BASE_URL within your platform's secure configuration settings. The agent will automatically detect these variables, provided they are stored in the host system's environment or credential manager.
Use Cases
CorpusGraph is ideal for knowledge workers, data scientists, and investigative researchers. Use it to:
- Analyze large legal discovery bundles or corporate document archives.
- Map organizational hierarchies or communication networks from email exports.
- Perform cross-document audit trails to verify financial figures or compliance requirements.
- Quickly search and summarize technical documentation or research papers.
- Extract structured datasets (e.g., entity lists) from heterogeneous document collections.
Example Prompts
- "Look through the current project corpus and tell me which organizations appear most frequently in our quarterly meeting notes."
- "Find any documents that mention 'Project X' and list the people and email addresses associated with that project."
- "Search all ingested files for mentions of suspicious financial transactions and provide a summary of the context for each find."
Tips & Limitations
- Performance: For massive datasets (10,000+ files), the initial ETL processing time will vary; use the
GET /api/discover/collectionsendpoint to monitor job status. - Granularity: Ensure your queries are specific. Because the graph maps deep relationships, vague queries may return broad result sets. Use the faceted search parameters to narrow down by entity type or file format.
- Security: Never paste credentials directly into the chat interface; always rely on the system-level configuration. If you receive a 401 error, refresh your token immediately.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-aingestigate-corpusgraph": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, external-api