Official Verified

chaoxing-download

Download PDF documents from Chaoxing (超星) contest/platform viewer URLs and convert to TXT. Use when user wants to download files from contestyd.chaoxing.com, 超星, or provides Chaoxing WPS viewer URLs with objectid parameters. Supports single or batch downloads with page count validation and automatic PDF-to-TXT conversion.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/artminding/chaoxing-download

Download Source Code (.zip)

Chaoxing Document Downloader (超星文档下载)

Download PDFs from Chaoxing WPS viewer URLs using the getYunFiles API.

Core Principle

Every Chaoxing viewer URL contains an objectid (32-char hex). Call the getYunFiles API to get the direct PDF link — no cookies or auth tokens needed.

Arguments

$ARGUMENTS contains the user's download request — typically one or more entries with page count, name, and viewer URL. Parse them to extract the data.

Download Method

Step 1: Extract objectid from each URL

Find the objectid=([a-f0-9]{32}) parameter in each viewer URL.

Step 2: Call getYunFiles API

For each objectid, call:

https://contestyd.chaoxing.com/app/files/{objectid}/getYunFiles?key=allData

Response JSON contains:

data.pdf — direct PDF URL on s3.cldisk.com or s3.ananas.chaoxing.com (preferred)
data.download — alternative download URL with auth tokens (fallback)
data.filename — original filename
data.pagenum — page count

Step 3: Download the PDF

Use the data.pdf URL to download directly. No authentication headers needed.

Save to: ~/Downloads/chaoxing_pdfs/{用户给的名称}.pdf

Step 4: Validate page count

Compare data.pagenum with the user's expected page count. Report any mismatch.

Step 5: Convert PDF to TXT (with OCR fallback)

After downloading each PDF, automatically extract text to a plain text file. Use a two-stage approach: native text extraction first, then OCR fallback for image-based pages.

Prerequisites:

pip install pymupdf rapidocr-onnxruntime

Conversion method (Python):

import sys, os, fitz
from rapidocr_onnxruntime import RapidOCR

if sys.platform == "win32":
    sys.stdout.reconfigure(encoding="utf-8")

ocr = RapidOCR()
pdf_path = "~/Downloads/chaoxing_pdfs/{name}.pdf"
doc = fitz.open(pdf_path)
all_text = []

for i, page in enumerate(doc):
    # Stage 1: Try native text extraction
    native = page.get_text().strip()
    if len(native) > 50:
        all_text.append(f"--- 第{i+1}页 ---\n{native}")
        continue
    # Stage 2: OCR fallback for image-based pages
    pix = page.get_pixmap(dpi=200)
    img_bytes = pix.tobytes("png")
    result, _ = ocr(img_bytes)
    ocr_text = "\n".join([item[1] for item in result]) if result else ""
    label = "OCR" if len(ocr_text) > 0 else "(empty)"
    all_text.append(f"--- 第{i+1}页 [{label}] ---\n{ocr_text}")

doc.close()
full_text = "\n".join(all_text)

with open(pdf_path.replace(".pdf", ".txt"), "w", encoding="utf-8") as f:
    f.write(full_text)

# Summary
native_pages = sum(1 for p in all_text if "[OCR]" not in p and "[empty]" not in p)
ocr_pages = sum(1 for p in all_text if "[OCR]" in p)
print(f"Native: {native_pages}p, OCR: {ocr_pages}p, Total: {len(full_text)} chars")

Output files per download:

{name}.pdf — original PDF
{name}.txt — plain text extraction (native + OCR pages marked with [OCR])

Read Full Documentation on GitHub

Metadata

Author@artminding

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-artminding-chaoxing-download": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

Video Transcript Pro

Skill by artminding

artminding 4473

video-transcript

视频/音频转录 → 优化润色 → 多平台发布稿。使用 faster-whisper 转录音视频，自动修正错误、提取金句、生成知乎/微信/小红书等多平台文章。Use when user wants to transcribe video/audio, convert speech to text, or create multi-platform articles from video content.

artminding 4473