ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

paper-repro-python

This skill should be used when the user asks to "reproduce a paper", "implement paper methods in Python", "extract paper content to Markdown", or works on paper reproduction tasks. Use for TeX-first extraction, modular Python implementation, and bilingual documentation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/celynnmoonlight/paper-repro-python
Or

Follow this workflow end-to-end unless the user explicitly asks to skip steps

1) Intake and scope

  • Confirm input artifacts: TeX source path(s), PDF path, supplementary files, target repository, and expected outputs.
  • State assumptions explicitly when information is missing.
  • Keep approach adaptable to the specific paper; do not force a fixed dependency stack or rigid project template.
  • Check whether the working folder already contains paper source files (.tex, .bib, style files, figures).
  • Check whether the working folder contains user-preprocessed documents (.md, .json, images such as .png, .jpg, .svg).
  • Source priority rule (read in order, stop when sufficient):
    1. TeX sources (preferred): If usable TeX source files (.tex, .bib, style files) are present, use them as the primary source.
    2. User-preprocessed documents (secondary): If TeX is absent or incomplete, read user-provided documents (.md, .json) and images (.png, .jpg, .svg) that may contain pre-extracted paper content.
    3. PDF fallback (last resort): Only when both TeX and user-preprocessed documents are unavailable or insufficient, fall back to PDF extraction.

2) Source extraction (TeX → preprocessed docs → PDF)

  • TeX path (highest priority):

    • Parse and read the main TeX project structure first (main.tex or equivalent entry file and includes).
    • Preserve original scientific wording when converting relevant content to Markdown notes.
    • Resolve equations, theorem blocks, citations, and appendices from source files whenever possible.
    • Record unresolved include/bibliography issues explicitly; do not invent missing content.
  • User-preprocessed documents path (secondary):

    • Read Markdown files (.md) that may contain paper content extracted by the user.
    • Read JSON files (.json) that may contain structured paper data (metadata, sections, references).
    • View images (.png, .jpg, .svg) that may contain paper figures, tables, or scanned pages.
    • Preserve original content; do not summarize or paraphrase.
    • Note the source of each piece of information (which file, which section).
  • PDF fallback path (lowest priority, when all else fails):

    • Extract paper content page by page into Markdown, preserving the original wording.
    • Do not summarize, paraphrase, or rewrite scientific statements.
    • Preserve structure faithfully:
      • Title, authors, affiliations, abstract, sections, subsections.
      • Equations (LaTeX-friendly when possible), theorem/lemma/proposition blocks.
      • Tables, figure captions, references, appendices, footnotes.
    • If a PDF is scanned or partially unreadable:
      • Run OCR and mark uncertain spans clearly.
      • Never silently invent missing text.
    • Include image references/placeholders when figures cannot be represented as plain text.
    • Produce one primary output file such as paper_fulltext.md.

3) Extraction quality checks

Metadata

Stars3951
Views0
Updated2026-04-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-celynnmoonlight-paper-repro-python": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.