ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

paper-fetch

Use when the user wants to download a paper PDF from a DOI, title, or URL via legal open-access sources. Tries Unpaywall, arXiv, bioRxiv/medRxiv, PubMed Central, and Semantic Scholar in order. Never uses Sci-Hub or paywall bypass.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/agents365-ai/paper-fetch
Or

paper-fetch

Fetch the legal open-access PDF for a paper given a DOI (or title). Tries multiple OA sources in priority order and stops at the first hit.

Agent-native. Structured JSON envelope on stdout, NDJSON progress on stderr, stable exit codes, machine-readable schema, TTY-aware format default, idempotent retries.

Resolution order

  1. Unpaywallhttps://api.unpaywall.org/v2/{doi}?email=$UNPAYWALL_EMAIL, read best_oa_location.url_for_pdf (skipped if UNPAYWALL_EMAIL not set)
  2. Semantic Scholarhttps://api.semanticscholar.org/graph/v1/paper/DOI:{doi}?fields=openAccessPdf,externalIds
  3. arXiv — if externalIds.ArXiv present, https://arxiv.org/pdf/{arxiv_id}.pdf
  4. PubMed Central OA — if PMCID present, https://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/
  5. bioRxiv / medRxiv — if DOI prefix is 10.1101, query https://api.biorxiv.org/details/{server}/{doi} for the latest version PDF URL
  6. Otherwise → report failure with title/authors so the user can request via ILL

If only a title is given, resolve to a DOI first via Semantic Scholar search_paper_by_title (asta MCP) or Crossref.

Usage

python scripts/fetch.py <DOI> [options]
python scripts/fetch.py --batch <FILE|-> [options]
python scripts/fetch.py schema           # machine-readable self-description

Flags

FlagDefaultDescription
doiDOI to fetch (positional). Use - to read a single DOI from stdin
--batch FILEFile with one DOI per line for bulk download. Use - to read from stdin
--out DIRpdfsOutput directory
--dry-runoffResolve sources without downloading; preview PDF URL and destination
--formatautojson for agents, text for humans. Auto-detects: json when stdout is not a TTY, text when it is
--prettyoffPretty-print JSON with 2-space indent
--streamoffEmit one NDJSON per line on stdout as each DOI resolves, then a summary line (batch mode)
--overwriteoffRe-download even when destination file already exists
--idempotency-key KEYSafe-retry key. Re-running with the same key replays the original envelope from <out>/.paper-fetch-idem/ without network I/O
--timeout SECONDS30HTTP timeout per request
--versionPrint CLI + schema version and exit

Agent discovery: schema subcommand

python scripts/fetch.py schema

Emits a complete machine-readable description of the CLI on stdout (no network). Includes cli_version, schema_version, parameter types, exit codes, error codes, envelope shapes, and environment variables. Agents should read this once, cache it against schema_version, and re-read when the cached version drifts.

Output contract

stdout emits a single JSON envelope. Every envelope carries a meta slot.

Success (all DOIs resolved):

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-agents365-ai-paper-fetch": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

semanticscholar-skill

Use when searching academic papers, looking up citations, finding authors, or getting paper recommendations using the Semantic Scholar API. Triggers on queries about research papers, academic search, citation analysis, or literature discovery.

agents365-ai 4473

grant-thinking-general

Use when evaluating grant ideas, diagnosing proposal logic, framing fundable projects, strengthening reviewer-aware arguments, or preparing to write any section of a research proposal.

agents365-ai 4473

journal-abbrev

Use when looking up journal or magazine name abbreviations, converting between full names and ISO 4/MEDLINE abbreviations, processing BibTeX files for journal name standardization, or answering questions about 期刊缩写/杂志缩写. Triggers on "journal abbreviation", "abbreviate journal", "journal name", "期刊缩写", "杂志缩写", "ISO 4", "LTWA", "BibTeX journal". PROACTIVELY USE when user mentions citation formatting, reference list preparation, or manuscript submission to specific journals.

agents365-ai 4473

scholar-deep-research

Use when the user asks for a literature review, academic deep dive, research report, state-of-the-art survey, topic scoping, comparative analysis of methods/papers, grant background, or any request that needs multi-source scholarly evidence with citations. Also trigger proactively when a user question clearly requires academic grounding (e.g. "what's known about X", "compare approach A vs B in the literature", "summarize the field of Y"). Runs an 8-phase (Phase 0..7), script-driven research workflow across OpenAlex, arXiv, Crossref, and PubMed, with deduplication, transparent ranking, citation chasing, self-critique, and structured report output with verifiable citations.

agents365-ai 4473

asta-skill

Domain expertise for Ai2 Asta MCP tools (Semantic Scholar corpus). Intent-to-tool routing, safe defaults, workflow patterns, and pitfall warnings for academic paper search, citation traversal, and author discovery.

agents365-ai 4473