pmc-harvest
Fetch articles from PubMed Central using NCBI APIs. Search journals, retrieve full text via OAI-PMH, batch harvest for RAG pipelines. No API key required.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/angusthefuzz/pmc-harvestWhat This Skill Does
The pmc-harvest skill is a high-performance utility designed to interface directly with the National Center for Biotechnology Information (NCBI) PubMed Central (PMC) database. It allows OpenClaw users to search, retrieve, and process medical and scientific literature programmatically. By leveraging the E-utilities suite and OAI-PMH protocols, the skill enables the extraction of full-text XML articles, metadata, and abstracts. This is an essential tool for building RAG (Retrieval-Augmented Generation) pipelines, conducting systematic reviews, or archiving open-access scientific data without the need for an individual API key.
Installation
To install this skill, run the following command in your terminal:
clawhub install openclaw/skills/skills/angusthefuzz/pmc-harvest
Ensure you have a Node.js environment configured within your OpenClaw directory, as the skill utilizes native JS scripts to handle API interactions and XML parsing.
Use Cases
- Research Automation: Automate the collection of new studies from specific journals to keep your RAG database current.
- Systematic Reviews: Efficiently pull full-text content for large sets of PMCIDs to speed up literature screening.
- Data Mining: Extract raw JATS XML structures for custom text analysis or entity recognition tasks.
- Content Aggregation: Batch harvest open-access articles from journals like 'Stroke' or 'BMC Neurology' for offline analysis.
Example Prompts
- "Search for the latest articles in the 'Journal of Stroke' published in 2025 and list the titles for me."
- "Download the full text XML for PMC12345678 and extract the main body content."
- "Run a batch harvest using the journals.json file to gather all open-access articles from the configured journals for my current project."
Tips & Limitations
- Open-Access Only: This skill can only access PMC records that are marked as open-access. Restricted or paywalled content is excluded by the OAI-PMH protocol.
- Rate Limiting: Without an API key, NCBI imposes a rate limit of approximately 3 requests per second. For large-scale batch operations, consider adding a delay or spacing out your requests to avoid being blocked.
- Peak Time Sensitivity: NCBI recommends avoiding peak hours (5 AM – 9 PM ET). Scheduling heavy harvesting jobs for off-peak times ensures better performance and respects official server limits.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-angusthefuzz-pmc-harvest": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-read, file-write, external-api
Related Skills
stirling-pdf
PDF manipulation via Stirling-PDF API. Merge, split, convert, OCR, compress, sign, redact, and more. Self-hosted.
cozi
Interact with Cozi Family Organizer (shopping lists, todo lists, item management). Unofficial API client for family organization.
crawl-for-ai
Web scraping using local Crawl4AI instance. Use for fetching full page content with JavaScript rendering. Better than Tavily for complex pages. Unlimited usage.
mealie
Interact with Mealie recipe manager (recipes, shopping lists, meal plans). Self-hosted recipe and meal planning API client.
Tnbc Research Swarm
Skill by angusthefuzz