deepseek-ocr
Expert skill for using DeepSeek-OCR, a vision-language model for optical character recognition with context optical compression supporting documents, PDFs, and images.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/adisinghstudent/deepseek-ocrWhat This Skill Does
DeepSeek-OCR is a high-performance vision-language model designed for advanced Optical Character Recognition tasks. It leverages "Contexts Optical Compression" technology to handle complex documents, PDFs, and images with exceptional accuracy. Unlike traditional OCR engines, this skill understands structure, context, and layout, enabling it to convert raw images into clean, structured markdown or extracted data. It is optimized for both high-throughput production environments via vLLM and flexible prototyping through HuggingFace.
Installation
To integrate this skill into your environment, use the OpenClaw CLI:
clawhub install openclaw/skills/skills/adisinghstudent/deepseek-ocr
Ensure your system meets the hardware requirements: CUDA 11.8+ and PyTorch 2.6.0. It is highly recommended to use a conda environment with Python 3.12.9. You will need to install vLLM (version 0.8.5 or higher) and flash-attention to achieve optimal inference speeds. For detailed dependencies, refer to the requirements.txt file provided in the model repository.
Use Cases
This skill is perfect for:
- Digitizing historical documents or handwritten notes into machine-readable markdown.
- Extracting tabular data from receipts, invoices, or financial statements for automated processing.
- Automating document analysis pipelines where document-to-markdown conversion is required for downstream LLM agents.
- Visual grounding tasks where specific elements within a document or image need to be located and categorized.
Example Prompts
- "Perform free OCR on the uploaded invoice and output the content as a clean markdown table."
- "Analyze this multi-page PDF scan using grounding mode and extract all headers, subheaders, and bullet points into a structured markdown document."
- "Convert this handwritten form into an organized JSON format while maintaining the logical structure of the fields."
Tips & Limitations
- Performance: Always use vLLM for production workloads to take advantage of high-throughput features.
- Quality: For complex document structures, use the
<|grounding|>prompt mode to improve structural accuracy. - Resource Usage: This model is vision-heavy; ensure your GPU has sufficient VRAM to handle high-resolution image inputs and large context window tokens.
- Whitelisting: When dealing with tables, ensure you include the specific token IDs for table tags to prevent truncation or formatting errors during inference.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-adisinghstudent-deepseek-ocr": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution
Related Skills
Oh My Openagent Omo
Skill by adisinghstudent
Planning With Files Manus Workflow
Skill by adisinghstudent
mirofish-offline-simulation
Fully local multi-agent swarm intelligence simulation engine using Neo4j + Ollama for public opinion, market sentiment, and social dynamics prediction.
ghostling-libghostty-terminal
Build minimal terminal emulators using the libghostty-vt C API with Raylib for windowing and rendering
Obra Superpowers Agentic Workflow
Skill by adisinghstudent