ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified data analysis Safety 4/5

pmc-harvest

Fetch articles from PubMed Central using NCBI APIs. Search journals, retrieve full text via OAI-PMH, batch harvest for RAG pipelines. No API key required.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/angusthefuzz/pmc-harvest
Or

What This Skill Does

The pmc-harvest skill is a high-performance utility designed to interface directly with the National Center for Biotechnology Information (NCBI) PubMed Central (PMC) database. It allows OpenClaw users to search, retrieve, and process medical and scientific literature programmatically. By leveraging the E-utilities suite and OAI-PMH protocols, the skill enables the extraction of full-text XML articles, metadata, and abstracts. This is an essential tool for building RAG (Retrieval-Augmented Generation) pipelines, conducting systematic reviews, or archiving open-access scientific data without the need for an individual API key.

Installation

To install this skill, run the following command in your terminal: clawhub install openclaw/skills/skills/angusthefuzz/pmc-harvest Ensure you have a Node.js environment configured within your OpenClaw directory, as the skill utilizes native JS scripts to handle API interactions and XML parsing.

Use Cases

  • Research Automation: Automate the collection of new studies from specific journals to keep your RAG database current.
  • Systematic Reviews: Efficiently pull full-text content for large sets of PMCIDs to speed up literature screening.
  • Data Mining: Extract raw JATS XML structures for custom text analysis or entity recognition tasks.
  • Content Aggregation: Batch harvest open-access articles from journals like 'Stroke' or 'BMC Neurology' for offline analysis.

Example Prompts

  1. "Search for the latest articles in the 'Journal of Stroke' published in 2025 and list the titles for me."
  2. "Download the full text XML for PMC12345678 and extract the main body content."
  3. "Run a batch harvest using the journals.json file to gather all open-access articles from the configured journals for my current project."

Tips & Limitations

  • Open-Access Only: This skill can only access PMC records that are marked as open-access. Restricted or paywalled content is excluded by the OAI-PMH protocol.
  • Rate Limiting: Without an API key, NCBI imposes a rate limit of approximately 3 requests per second. For large-scale batch operations, consider adding a delay or spacing out your requests to avoid being blocked.
  • Peak Time Sensitivity: NCBI recommends avoiding peak hours (5 AM – 9 PM ET). Scheduling heavy harvesting jobs for off-peak times ensures better performance and respects official server limits.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-angusthefuzz-pmc-harvest": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#pubmed#biomedical#research#data-harvesting#xml
Safety Score: 4/5

Flags: network-access, file-read, file-write, external-api