ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

polaris-datainsight-doc-extract

Extract structured data from Office documents (DOCX, PPTX, XLSX, HWP, HWPX) using the Polaris AI DataInsight Doc Extract API. Use when the user wants to parse, analyze, or extract text, tables, charts, images, or shapes from document files. Invoke this skill whenever the user mentions extracting content from Word, PowerPoint, Excel, HWP, or HWPX files, wants to parse document structure, needs to convert document data for RAG pipelines, or asks about reading tables, charts, or text from Office-format documents — even if they don't explicitly mention "DataInsight" or "Polaris".

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/jacob-g-park/polaris-datainsight-doc-extract
Or

Polaris AI DataInsight — Doc Extract Skill

Use the Polaris AI DataInsight Doc Extract API to extract text, images, tables, charts, shapes, equations, and more from Word, PowerPoint, Excel, HWP, and HWPX files, returning everything as a structured unifiedSchema JSON. A single API call gives you the full document structure without any manual parsing.


When to Use This Skill

  • The user wants to extract text, tables, charts, or images from DOCX, PPTX, XLSX, HWP, or HWPX files
  • The user needs to understand a document's structure (page count, element types, position data, etc.)
  • The extracted data will be used in a RAG pipeline, data analysis workflow, or automation task
  • Table data needs to be converted to CSV, or chart data needs to be broken down into series and labels
  • The user needs to parse special elements like headers, footers, equations, or shapes

What This Skill Does

  1. Authentication — Authenticates with the Polaris DataInsight API via the x-po-di-apikey header.
  2. Upload and extract — Sends the file as a multipart/form-data POST request and extracts the full document structure.
  3. Parse ZIP response — The API returns a ZIP file; extract it and load the unifiedSchema JSON inside.
  4. Deliver structured data — Returns a JSON organized by page and element type (text, table, chart, image, shape, equation, etc.).
  5. Support multiple usage patterns — Handles full text extraction, table-to-CSV conversion, RAG chunk generation, and more.

How to Use

Prerequisites

Get an API Key: Sign up at https://datainsight.polarisoffice.com and generate your API key.

Authentication: Include the API key as a header on every request.

Header: x-po-di-apikey: $POLARIS_DATAINSIGHT_API_KEY

Set the environment variable:

export POLARIS_DATAINSIGHT_API_KEY="your-api-key-here"

Limits

ItemLimit
Supported formatsHWP, HWPX, DOCX, PPTX, XLSX
Max file size25 MB
Timeout10 minutes
Rate limit10 requests per minute

Basic Usage

Endpoint:

POST https://datainsight-api.polarisoffice.com/api/v1/datainsight/doc-extract

Extract a document with Python:

import requests
import json
import zipfile
import io

def extract_document(file_path: str, api_key: str) -> dict:
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://datainsight-api.polarisoffice.com/api/v1/datainsight/doc-extract",
            headers={"x-po-di-apikey": api_key},
            files={"file": f}
        )

    if response.status_code != 200:
        raise Exception(f"API error: {response.status_code} - {response.text}")

Metadata

Stars2032
Views0
Updated2026-03-05
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-jacob-g-park-polaris-datainsight-doc-extract": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.