Official Verified developer tools Safety 4/5

html2md

Convert HTML pages to clean, agent-friendly markdown using Readability + Turndown. Strips navigation, ads, footers, cookie banners, social CTAs. Supports URL fetch, local files, stdin, token budgeting, and output flags. Ideal for research tasks, content extraction, and web scraping in agent workflows.

Why use this skill?

Optimize web content for AI agents with html2md. Strip ads, navs, and clutter to get clean, token-optimized markdown for better research and extraction tasks.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/saikatkumardey/html2md

Download Source Code (.zip)

What This Skill Does

html2md is a high-performance utility designed to convert complex HTML content into clean, agent-readable markdown. By leveraging Mozilla’s Readability engine and the Turndown library, it effectively strips away clutter such as navigation bars, sidebars, advertisements, cookie banners, and social media call-to-actions. This transformation ensures that AI agents receive only the relevant semantic information, optimizing context windows and reducing hallucination risks. The tool is highly configurable, supporting direct URL fetching, local file processing, and standard input piping, while offering robust features like token budgeting to ensure content fits within specific model constraints.

Installation

To install the html2md skill within your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/saikatkumardey/html2md Once installed, ensure you have Node.js version 22 or higher. Navigate to the skill directory, run npm install to resolve dependencies, and use npm link to make the html2md command globally accessible across your agent workflows.

Use Cases

This skill is indispensable for research-heavy AI workflows. Use html2md when you need to ingest long-form articles, documentation pages, or blog posts for summarization, entity extraction, or RAG (Retrieval-Augmented Generation) indexing. It is particularly effective for cron-job-based agents that monitor websites for updates, as the output is consistently cleaned. Furthermore, developers can leverage the --json output flag to integrate the extracted text and token metadata directly into programmatic pipelines.

Example Prompts

"html2md https://paulgraham.com/greatwork.html --max-tokens 2000 - extract the core thesis of this essay into a clean markdown format for my notes."
"Take this local file, page.html, run html2md on it, and strip all links so I only get the raw text content."
"Fetch the documentation from https://docs.openclaw.com, convert it to markdown, and provide me with the JSON output so I can analyze the token count."

Tips & Limitations

Token Budgeting: Always use the --max-tokens flag when dealing with massive documents to prevent exceeding model context windows. The tool intelligently keeps headings while truncating body text.
Readability Limits: If a website relies on non-standard layouts (like complex data tables), Readability might return less content than expected; the tool features a fallback mode to the raw body to mitigate this.
Network Security: Note that this tool performs direct network requests. Ensure you only provide trusted URLs to avoid unintended SSRF exposure in your agent architecture.
Error Handling: The tool is designed for reliability; it exits with code 1 and provides stderr feedback for timeouts, bad URLs, or file access issues, making it highly reliable for automated script integration.

Read Full Documentation on GitHub

Metadata

Author@saikatkumardey

Stars1133

Updated2026-02-18

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-saikatkumardey-html2md": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#markdown#web-scraping#content-processing#ai-agent#data-extraction

Safety Score: 4/5

Flags: network-access, file-read

Related Skills

searxng

Self-hosted web search aggregator for OpenClaw agents. Use this skill to (1) install SearXNG on a VPS/server so the agent can search the web without API keys, or (2) run web searches using an existing local SearXNG instance. Covers installation, configuration, the search.py CLI tool, and fallback behaviour. Use when the agent needs web search capability, when setting up a new OpenClaw instance, or when diagnosing search failures.

saikatkumardey 1133