jina-reader
Extract clean, readable markdown content from any URL using Jina Reader API. Use when you need to fetch and parse web pages without dealing with HTML, JavaScript rendering, or paywalls. Ideal for research, article summarization, content analysis, and working with search results from tavily-search, web_search, or searxng skills.
Why use this skill?
Use jina-reader to quickly convert complex web pages into clean, AI-ready Markdown. Perfect for research, summarization, and data extraction pipelines.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/smile-xuc/haibo-jina-readerWhat This Skill Does
The jina-reader skill is a powerful web-scraping utility designed to transform messy, complex web pages into clean, structured Markdown. By leveraging the Jina Reader API, it effectively bypasses common obstacles such as heavy JavaScript rendering, intrusive paywalls, and complex HTML layouts that often hinder traditional scraping methods. It serves as a vital bridge between raw web data and AI-ready content, ensuring that your LLM agents can interpret articles, research papers, and technical documentation with high fidelity. Whether you are conducting competitive analysis or simply summarizing long-form content, this skill ensures that the textual data is clean, well-formatted, and stripped of unnecessary boilerplate code.
Installation
To integrate jina-reader into your OpenClaw environment, ensure you have the necessary permissions enabled for network access. You can install the skill by running the following command in your terminal:
clawhub install openclaw/skills/skills/smile-xuc/haibo-jina-reader
Once installed, verify the installation by checking your skill manifest. The scripts are typically located in your project's scripts/ directory, allowing for direct command-line execution or programmatic invocation via your agent's task-processing pipelines.
Use Cases
This skill is highly versatile and fits into many data-heavy workflows:
- Academic & Market Research: Extract content from multiple sources to synthesize reports without manually visiting sites.
- Content Curation: Automatically aggregate newsletters or articles from RSS feeds and convert them into a unified format for internal review.
- LLM Context Preparation: Fetch relevant documentation or news to provide the AI with up-to-date context that it wouldn't otherwise have access to.
- Automated Summarization: Pipe long-form URL content directly into analysis scripts to generate executive summaries.
Example Prompts
- "Use jina-reader to fetch the content from https://example.com/tech-article and summarize the key findings for me."
- "Get the latest news from https://news.ycombinator.com/item?id=123456 and export it as a JSON object to my project folder."
- "Run a batch extraction on these three URLs (provided in a list) using the jina-reader skill, then combine the text content into a single document for analysis."
Tips & Limitations
- Efficiency: Always use the
--format jsonflag if you intend to programmatically parse metadata (like publication dates) to ensure reliable data structures. - Networking: As this tool fetches data from the web, ensure your local network firewall allows outgoing requests to the Jina Reader API endpoints.
- Complexity: While the tool handles JavaScript well, highly interactive SPAs (Single Page Applications) might occasionally require retries if the content load is excessively slow.
- Cleanup: When processing large volumes of data, utilize the
-ooutput flag to save results to disk rather than cluttering your terminal buffer.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-smile-xuc-haibo-jina-reader": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, external-api