clip-local
Clips a YouTube video locally using yt-dlp and ffmpeg. Supports auto-highlight detection, translation, and CapCut-style karaoke subtitle burning. Triggers when the user wants local video clipping, highlight extraction, or subtitle generation. Optional GROQ_API_KEY env var enables Whisper transcription fallback when YouTube has no subtitles.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chyyynh/video-clip-skillVideo Clip (Local)
Requires yt-dlp, ffmpeg, and python3. Check with command -v.
Finding plugin scripts
The ASS karaoke generator is bundled with this plugin. Locate it once at the start (this only searches for the plugin's own bundled file):
ASS_SCRIPT=$(find ~/.claude/plugins -path '*/clip-local/*/scripts/ass-karaoke.py' 2>/dev/null | head -1)
Auto-highlight mode
When the user does NOT specify start/end times (e.g., "幫我剪這個影片的精華" or "clip the best parts"):
- Download the full transcript (step 1–2 below)
- Read the entire transcript and identify 3–5 highlight segments. For each, note:
- Start and end timestamps
- A short description of why it's interesting (key insight, funny moment, dramatic turn, etc.)
- Present the highlights to the user as numbered options and ask which ones to clip
- Clip only the segments the user picks, then continue with the normal pipeline (translate, subtitle, etc.)
Pipeline
1. Get video info and original language
yt-dlp --print title --print duration_string --print language \
--no-playlist --no-warnings --force-ipv4 "<URL>"
The third line is the original language code (e.g., en, en-US, ja, zh-Hant). Use the base code (before -) for subtitle download.
2. Download original language subtitles
yt-dlp --write-auto-sub --sub-lang "<LANG>*" --sub-format vtt --skip-download \
--no-playlist --no-warnings --force-ipv4 \
--extractor-args 'youtube:player-client=default,mweb' \
-o "subs" "<URL>"
Replace <LANG> with the base language code from step 1 (e.g., en, ja). The * wildcard matches variants like en-orig. Do NOT use YouTube's auto-translated subs — they are low quality. All translation is done by you.
3. Trim VTT to clip range
When clipping a portion (e.g., 10–130s), filter the VTT to only include cues whose timestamps fall within the range. Keep the original absolute timestamps — do NOT adjust them. The --offset flag in ass-karaoke.py handles the time shift.
When filtering, strip any extra metadata from timestamp lines (e.g., align:start position:0%) — keep only HH:MM:SS.mmm --> HH:MM:SS.mmm. The ASS parser regex expects clean timestamp lines.
4. Translate subtitles
Write and execute a Python script that:
- Parses the trimmed VTT (regex:
HH:MM:SS.mmm --> HH:MM:SS.mmm+ text lines) - Collects all text lines into a list
- You translate the list (print a Python list of translated strings)
- Writes a new VTT with identical timestamps and translated text
Example structure:
import re
# Parse original VTT
with open("clip.vtt") as f:
content = f.read()
cues = re.findall(r'(\d{2}:\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}:\d{2}\.\d{3})\n((?:(?!\d{2}:\d{2}).+\n?)*)', content)
# Translations — fill this list with your translations, one per cue
translations = [
"translated line 1",
"translated line 2",
# ...
]
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chyyynh-video-clip-skill": {
"enabled": true,
"auto_update": true
}
}
}