ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

epub

Use this skill whenever the user wants to read, parse, extract content from, modify, or otherwise process an .epub file. Triggers include any mention of ".epub", "ebook", "epub file", or requests to extract chapters, table of contents, text, images, or metadata from an ebook. Also use when the user wants to convert epub content to another format, inspect epub structure, or edit epub files. Since epub files are ZIP archives in disguise, this skill uses a reliable unzip-then-parse approach that always works. Use this skill even for seemingly simple epub tasks like "read this epub" or "show me the chapters" — the extraction workflow is always needed.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/gaojizhou/epub
Or

EPUB Processing Guide

Core Insight: EPUB is a ZIP Archive

An .epub file is simply a ZIP archive with a specific internal structure. The most reliable way to process any epub is:

  1. Copy the file to the working directory
  2. Rename it from .epub.zip
  3. Unzip it into a folder
  4. Find and read the navigation/TOC file first (e.g. nav.xhtml, nav.html, toc.ncx)
  5. Then read content files as needed

This approach works 100% of the time and requires no special epub libraries.


Step-by-Step Workflow

Step 1: Extract the EPUB

# Copy uploaded file to working directory
cp /mnt/user-data/uploads/book.epub /home/claude/book.epub

# Rename to .zip and extract
cp /home/claude/book.epub /home/claude/book.zip
unzip -o /home/claude/book.zip -d /home/claude/book_extracted/

# List the extracted contents
find /home/claude/book_extracted/ -type f | sort

Step 2: Find the Navigation File (Highest Priority)

The navigation file is the table of contents — it tells you the book's structure, chapter order, and file layout. Always find and read this first.

# Look for nav files (in priority order)
find /home/claude/book_extracted/ -type f \( \
  -name "nav.xhtml" -o \
  -name "nav.html" -o \
  -name "toc.ncx" -o \
  -name "*nav*" -o \
  -name "*toc*" \
\) | sort

Nav file priority order:

  1. nav.xhtml or nav.html — EPUB3 navigation document (preferred)
  2. toc.ncx — EPUB2 navigation control file (older format)
  3. Any file with "nav" or "toc" in its name
# Read the nav file to understand structure
cat /home/claude/book_extracted/OEBPS/nav.xhtml
# or
cat /home/claude/book_extracted/EPUB/nav.html

Step 3: Find the OPF Package File

The .opf file (Open Packaging Format) contains metadata and the full reading order manifest.

# Find the OPF file
find /home/claude/book_extracted/ -name "*.opf" | head -5

# Read it for metadata and spine (reading order)
cat /home/claude/book_extracted/OEBPS/content.opf

The <spine> element in the OPF file defines chapter reading order. The <metadata> block has title, author, language, etc.

Step 4: Read Content Files

# Find all HTML/XHTML content files
find /home/claude/book_extracted/ -type f \( -name "*.html" -o -name "*.xhtml" \) | sort

# Read a specific chapter
cat /home/claude/book_extracted/OEBPS/chapter01.xhtml

To extract clean text from HTML content:

from bs4 import BeautifulSoup

with open("/home/claude/book_extracted/OEBPS/chapter01.xhtml", "r", encoding="utf-8") as f:
    soup = BeautifulSoup(f.read(), "html.parser")
    
# Remove script/style tags
for tag in soup(["script", "style"]):
    tag.decompose()

text = soup.get_text(separator="\n", strip=True)
print(text)

Typical EPUB Directory Structure

Metadata

Author@gaojizhou
Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-gaojizhou-epub": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.