Official Verified

Sci-Data-Extractor

AI-powered tool for extracting structured data from scientific literature PDFs

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/jackkuo666/sci-data-extractor

You are a professional scientific literature data extraction assistant, helping users extract structured data from scientific paper PDFs.

Core Features

PDF Content Extraction

Extract text from PDFs using Mathpix OCR or PyMuPDF
Support for formula and table recognition

Data Extraction

Use LLMs (Claude/GPT-4o/compatible APIs) to extract structured data from literature
Automatically identify field types and data structures
Support custom extraction rules and prompts

Output Formats

Markdown tables
CSV files

Installation

Prerequisites

Python 3.8+
pip package manager

Setup Steps

Install Python dependencies (choose one method):

Method 1: Using uv (Recommended - Fastest)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
cd /path/to/sci-data-extractor
uv venv
source .venv/bin/activate  # Linux/macOS
# or .venv\Scripts\activate  # Windows
uv pip install -r requirements.txt

Method 2: Using conda (Best for scientific/research users)

cd /path/to/sci-data-extractor
conda create -n sci-data-extractor python=3.11 -y
conda activate sci-data-extractor
pip install -r requirements.txt

Method 3: Using pip directly (Built-in, no extra installation)

cd /path/to/sci-data-extractor
pip install -r requirements.txt

Configure API credentials:

# Copy example configuration
cp .env.example .env

# Edit .env and add your API key
# Get API key from: https://console.anthropic.com/
EXTRACTOR_API_KEY=your-api-key-here
EXTRACTOR_BASE_URL=https://api.anthropic.com
EXTRACTOR_MODEL=claude-sonnet-4-5-20250929
EXTRACTOR_MAX_TOKENS=16384

Optional: Configure Mathpix OCR (for high-precision OCR):

# Get credentials from: https://api.mathpix.com/
MATHPIX_APP_ID=your-mathpix-app-id
MATHPIX_APP_KEY=your-mathpix-app-key

Verify Installation

python extractor.py --help

Get API Keys

Anthropic Claude: https://console.anthropic.com/
OpenAI: https://platform.openai.com/api-keys
Mathpix OCR: https://api.mathpix.com/

How to Use

When users request data extraction:

Understand requirements: Ask what type of data to extract
Choose method:
- Use preset templates (enzyme/experiment/review)
- Use custom extraction prompts

Execute extraction:

python extractor.py input.pdf --template enzyme -o output.md

Verify results: Display extracted data and ask if adjustments needed

Preset Templates

Enzyme Kinetics Data (enzyme)

Fields: Enzyme, Organism, Substrate, Km, Unit_Km, Kcat, Unit_Kcat, Kcat_Km, Unit_Kcat_Km, Temperature, pH, Mutant, Cosubstrate

Experimental Results Data (experiment)

Fields: Experiment, Condition, Result, Unit, Standard_Deviation, Sample_Size, p_value

Read Full Documentation on GitHub

Metadata

Author@jackkuo666

Stars2032

Updated2026-03-05

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-jackkuo666-sci-data-extractor": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

Sci-Hub-Search

AI-powered tool for searching and downloading academic papers through Sci-Hub

jackkuo666 2032

debugging-r-environment-and-dependencies

Diagnose and fix R environment issues, including package installation failures, dependency conflicts, system library problems, renv errors, and Bioconductor version mismatches.

jackkuo666 2032

generating-publication-ready-figures-in-r

Transform standard ggplot2 figures into publication-quality visualizations matching Nature, Science, and other top journal styles with proper themes, colors, fonts, and export settings.

jackkuo666 2032

rstudio-research-agent

Interact with R and RStudio environments for scientific research tasks including creating projects, running analyses, managing dependencies, and generating publication-quality plots.

jackkuo666 2032

Semanticscholar Search Skill

Skill by jackkuo666

jackkuo666 2032