Official Verified ai models Safety 4/5

deepseek-ocr

Expert skill for using DeepSeek-OCR, a vision-language model for optical character recognition with context optical compression supporting documents, PDFs, and images.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/adisinghstudent/deepseek-ocr

Download Source Code (.zip)

What This Skill Does

DeepSeek-OCR is a high-performance vision-language model designed for advanced Optical Character Recognition tasks. It leverages "Contexts Optical Compression" technology to handle complex documents, PDFs, and images with exceptional accuracy. Unlike traditional OCR engines, this skill understands structure, context, and layout, enabling it to convert raw images into clean, structured markdown or extracted data. It is optimized for both high-throughput production environments via vLLM and flexible prototyping through HuggingFace.

Installation

To integrate this skill into your environment, use the OpenClaw CLI: clawhub install openclaw/skills/skills/adisinghstudent/deepseek-ocr

Ensure your system meets the hardware requirements: CUDA 11.8+ and PyTorch 2.6.0. It is highly recommended to use a conda environment with Python 3.12.9. You will need to install vLLM (version 0.8.5 or higher) and flash-attention to achieve optimal inference speeds. For detailed dependencies, refer to the requirements.txt file provided in the model repository.

Use Cases

This skill is perfect for:

Digitizing historical documents or handwritten notes into machine-readable markdown.
Extracting tabular data from receipts, invoices, or financial statements for automated processing.
Automating document analysis pipelines where document-to-markdown conversion is required for downstream LLM agents.
Visual grounding tasks where specific elements within a document or image need to be located and categorized.

Example Prompts

"Perform free OCR on the uploaded invoice and output the content as a clean markdown table."
"Analyze this multi-page PDF scan using grounding mode and extract all headers, subheaders, and bullet points into a structured markdown document."
"Convert this handwritten form into an organized JSON format while maintaining the logical structure of the fields."

Tips & Limitations

Performance: Always use vLLM for production workloads to take advantage of high-throughput features.
Quality: For complex document structures, use the <|grounding|> prompt mode to improve structural accuracy.
Resource Usage: This model is vision-heavy; ensure your GPU has sufficient VRAM to handle high-resolution image inputs and large context window tokens.
Whitelisting: When dealing with tables, ensure you include the specific token IDs for table tags to prevent truncation or formatting errors during inference.

Read Full Documentation on GitHub

Metadata

Author@adisinghstudent

Stars3809

Updated2026-04-05

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-adisinghstudent-deepseek-ocr": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#ocr#vision#document-processing#markdown#vllm

Safety Score: 4/5

Flags: file-read, code-execution

Related Skills

Oh My Openagent Omo

Skill by adisinghstudent

adisinghstudent 3809

Planning With Files Manus Workflow

Skill by adisinghstudent

adisinghstudent 3809

mirofish-offline-simulation

Fully local multi-agent swarm intelligence simulation engine using Neo4j + Ollama for public opinion, market sentiment, and social dynamics prediction.

adisinghstudent 3809

ghostling-libghostty-terminal

Build minimal terminal emulators using the libghostty-vt C API with Raylib for windowing and rendering

adisinghstudent 3809

Obra Superpowers Agentic Workflow

Skill by adisinghstudent

adisinghstudent 3809