ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

pdf-rename

Rename academic PDF papers to a standardized format "[Year] [Venue] Title.pdf" using a three-stage pipeline (Extract → Verify → Rename). Use when the user asks to organize, batch-rename, or metadata-enrich PDF files in a folder. Activates on keywords like "rename PDFs", "organize papers", "batch rename PDFs", "rename papers by metadata", "pdf重命名", "文献整理".

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/67available/pdf-rename
Or

PDF Rename — Academic Paper Organizer

Rename academic PDFs to: [Year] [Venue] Title.pdf

Three-stage pipeline (strict order):

Extract → Verify → Rename

Anti-error principle: Never re-parse PDF content during Rename stage. The Manifest is the single source of truth.


Quick Start

# Stage 1: Extract metadata → generate manifest
python scripts/extract.py "<folder_path>"

# Stage 2: Verify (manual or web search), then inject verified data
#   → Edit scripts/VERIFIED_DATA dict with web-verified values
python scripts/apply_verified.py "<folder_path>"

# Stage 3: Preview rename plan
python scripts/execute.py "<folder_path>" --preview

# Execute rename (with backup)
python scripts/execute.py "<folder_path>" --execute

Workflow Details

Stage 1: Extract

scripts/extract.py reads every PDF in the folder and generates manifest.json.

For each PDF it extracts:

  • Title: from PDF first-page text (heuristic: first non-metadata line)
  • Year: from filename prefix (most reliable) or PDF text (conference-year pattern)
  • Venue: inferred from PDF text (NeurIPS, ICML, arXiv, etc.)
  • Status: needs_verification (title/year from auto-extraction)

Manifest schema — see references/manifest_spec.md

⚠️ PDF text extraction is unreliable for titles. Expected quality: filename > PDF text for title. Always verify with web search before executing rename.

Stage 2: Verify

Before running rename, manually or via web search verify:

  1. Title is correct (filename is often sufficient, but multi-word titles may differ)
  2. Year is correct (arXiv submission year ≠ conference year)
  3. Venue is correct

Inject verified data via scripts/apply_verified.py:

  • Key = original filename (exact match)
  • Value = {'title', 'year', 'venue', 'confirmed': True}

Set confirmed: False or omit entry for files to skip.

Stage 3: Rename

scripts/execute.py reads manifest and renames files:

  • Status must be ready to execute
  • Duplicate titles → append (1), (2), etc.
  • Files with status needs_verification or manual_review are skipped
  • Backup is created automatically at <folder>/_backup_YYYYMMDD_HHMMSS/

Key Design Decisions

ProblemSolution
PDF title extraction garbled/incompleteUse filename as primary title source; PDF text only for venue/year hints
Wrong year from arXiv ID vs conference yearVerify with web search; inject corrected year in VERIFIED_DATA
Duplicate papers (same content, different filenames)Detect via title similarity; rename both with (1), (2) suffixes
Accidental data lossAlways create timestamped backup before renaming

Scripts

Metadata

Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-67available-pdf-rename": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.