upstage-document-parse
Parse documents (PDF, images, DOCX, PPTX, XLSX, HWP) using Upstage Document Parse API. Extracts text, tables, figures, and layout elements with bounding boxes. Use when user asks to parse, extract, or analyze document content, convert documents to markdown/HTML, or extract structured data from PDFs and images.
Why use this skill?
Use the OpenClaw upstage-document-parse skill to effortlessly extract text, tables, and layouts from PDFs, images, and office documents into structured markdown.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/upstage-deployment/upstage-document-parseWhat This Skill Does
The upstage-document-parse skill acts as a powerful interface for Upstage's Document Parse API within the OpenClaw environment. It is designed to bridge the gap between unstructured document files and machine-readable data. By leveraging advanced vision and OCR capabilities, it can parse a wide variety of formats—including PDF, DOCX, XLSX, PPTX, HWP, and common image types—to extract precise text, structural layouts, tables, and figures. This skill provides full control over how documents are ingested, allowing users to toggle between standard and enhanced modes depending on the complexity of the document layout, and it supports optional coordinate extraction to maintain spatial context for every identified element.
Installation
To begin using this skill, ensure you have the OpenClaw CLI installed, then run the command openclaw install upstage-document-parse. After installation, you must configure your Upstage API key to authorize the skill. You can do this by running openclaw config set skills.entries.upstage-document-parse.apiKey "your-api-key" or by manually updating your ~/.openclaw/openclaw.json configuration file with the credentials provided from the Upstage Console.
Use Cases
This skill is ideal for professionals needing to digitize analog records, analysts extracting data from financial tables in PDFs, or developers building RAG (Retrieval-Augmented Generation) pipelines that require structured markdown or HTML input. It excels at parsing legacy document formats like HWP, converting long-form reports into machine-understandable formats, and isolating visual elements or charts for downstream data processing tasks.
Example Prompts
- "Parse this PDF located at ~/Documents/quarterly_report.pdf and convert the tables into CSV format."
- "Extract all text and images from the scanned invoice in ~/Invoices/scan_001.jpg using enhanced mode."
- "Convert the document at ~/Manuals/project_specs.docx into clean markdown format for my notes."
Tips & Limitations
For optimal results, use 'enhanced' mode when dealing with highly complex layouts or overlapping elements. Be aware that while the API supports up to 1000 pages for asynchronous tasks, synchronous requests are recommended for documents under 20 pages to ensure stable response times. If you are processing scanned documents with low text clarity, use the 'ocr=force' parameter. Note that chart recognition and multipage table merging are currently in beta, so verify outputs for critical data applications.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-upstage-deployment-upstage-document-parse": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, external-api