transcribe
For transcript or subtitle requests involving podcast URLs, public audio URLs/files, or raw transcript cleanup. Generates audio + SRT + TXT artifacts and can optionally clean transcripts with episode-page context.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/dairui1/podcast-transcribeTranscribe with podcast-helper
Generate transcript artifacts from a podcast episode, audio file, or raw transcript, with an optional cleanup pass that uses episode-page context.
Default Workflow
- Choose a dedicated output directory such as
./out/<episode-slug>/. - Run
npx podcast-helper transcribe <input> --output-dir <dir> --json. - Add
--progress jsonlonly when machine-readable progress is needed. - Report the generated artifact paths for audio,
.srt, and.txt. - Ask whether the user wants cleanup. Do not run cleanup implicitly.
If you are already inside this repository and dist/cli.js exists, node dist/cli.js ... is acceptable. Do not default to repository-local build steps outside this repository.
If you are inside this repository and dist/cli.js is missing, run pnpm run build before using the repo-local entry point.
Gotchas
- Prefer no-install entry points first:
npx, thenpnpm dlx, then a globally installedpodcast-helper. - Let the CLI auto-select the engine unless the user explicitly requests a backend or needs offline Apple Silicon transcription.
- Spotify URLs are unsupported because the audio is DRM-protected. Ask for an RSS-backed episode page, Apple Podcasts link, or direct audio URL instead.
- YouTube inputs require
yt-dlp. - Generic episode pages sometimes hide audio metadata. If source resolution fails, download the audio separately and rerun with the file path.
- Hosted transcription failures usually come from a missing or wrong provider API key.
- Local
mlx-whisperruns requireffmpeg,python3, and a working runtime frompodcast-helper setup mlx-whisper. - Keep the raw transcript untouched. Cleanup should write a sibling
*.cleaned.txt.
Command Forms
Default:
npx podcast-helper transcribe <input> --output-dir ./out/<slug> --json
Fallbacks:
pnpm dlx podcast-helper transcribe <input> --output-dir ./out/<slug> --jsonpodcast-helper transcribe <input> --output-dir ./out/<slug> --jsonnode dist/cli.js transcribe <input> --output-dir ./out/<slug> --jsononly inside this repository
For offline Apple Silicon:
npx podcast-helper transcribe <input> --engine mlx-whisper --output-dir ./out/<slug> --json
Cleanup Branch
Only enter cleanup when the user asks for it or already has a raw transcript.
- Fetch episode context with
curl https://r.jina.ai/<podcast-url>. - Use the page as reference context for obvious ASR repairs, especially names and proper nouns.
- Do not summarize, invent missing content, or overwrite the raw transcript.
- Write a sibling
*.cleaned.txtfile.
If no episode URL is available, clean conservatively and explicitly say that external episode context was not used.
References
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-dairui1-podcast-transcribe": {
"enabled": true,
"auto_update": true
}
}
}