Follow this workflow end-to-end unless the user explicitly asks to skip steps

1) Intake and scope

Confirm input artifacts: TeX source path(s), PDF path, supplementary files, target repository, and expected outputs.
State assumptions explicitly when information is missing.
Keep approach adaptable to the specific paper; do not force a fixed dependency stack or rigid project template.
Check whether the working folder already contains paper source files (.tex, .bib, style files, figures).
Check whether the working folder contains user-preprocessed documents (.md, .json, images such as .png, .jpg, .svg).
Source priority rule (read in order, stop when sufficient):
1. TeX sources (preferred): If usable TeX source files (.tex, .bib, style files) are present, use them as the primary source.
2. User-preprocessed documents (secondary): If TeX is absent or incomplete, read user-provided documents (.md, .json) and images (.png, .jpg, .svg) that may contain pre-extracted paper content.
3. PDF fallback (last resort): Only when both TeX and user-preprocessed documents are unavailable or insufficient, fall back to PDF extraction.

2) Source extraction (TeX → preprocessed docs → PDF)

TeX path (highest priority):
- Parse and read the main TeX project structure first (main.tex or equivalent entry file and includes).
- Preserve original scientific wording when converting relevant content to Markdown notes.
- Resolve equations, theorem blocks, citations, and appendices from source files whenever possible.
- Record unresolved include/bibliography issues explicitly; do not invent missing content.
User-preprocessed documents path (secondary):
- Read Markdown files (.md) that may contain paper content extracted by the user.
- Read JSON files (.json) that may contain structured paper data (metadata, sections, references).
- View images (.png, .jpg, .svg) that may contain paper figures, tables, or scanned pages.
- Preserve original content; do not summarize or paraphrase.
- Note the source of each piece of information (which file, which section).
PDF fallback path (lowest priority, when all else fails):
- Extract paper content page by page into Markdown, preserving the original wording.
- Do not summarize, paraphrase, or rewrite scientific statements.
- Preserve structure faithfully:
  - Title, authors, affiliations, abstract, sections, subsections.
  - Equations (LaTeX-friendly when possible), theorem/lemma/proposition blocks.
  - Tables, figure captions, references, appendices, footnotes.
- If a PDF is scanned or partially unreadable:
  - Run OCR and mark uncertain spans clearly.
  - Never silently invent missing text.
- Include image references/placeholders when figures cannot be represented as plain text.
- Produce one primary output file such as paper_fulltext.md.

paper-repro-python

Install via CLI (Recommended)

Follow this workflow end-to-end unless the user explicitly asks to skip steps

1) Intake and scope

2) Source extraction (TeX → preprocessed docs → PDF)

3) Extraction quality checks

Metadata