video2txt
将本地视频或音频文件转写为 SRT 字幕文件和 TXT 纯文本文件
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chentx1243/maple-video2txtWhat This Skill Does
The video2txt skill is a powerful local utility designed to transform video and audio files into accessible, text-based formats. Utilizing the high-performance faster-whisper library, this tool extracts speech from multimedia files and generates two primary outputs: an SRT subtitle file with precise timestamps and a clean TXT text file. It is specifically optimized for Chinese language processing, featuring automatic conversion to Simplified Chinese. The skill is designed for performance, offering adjustable parameters such as model size, beam size, and hardware acceleration (CPU/CUDA) to balance between speed and transcription accuracy.
Installation
To install this skill, run the following command in your terminal: clawhub install openclaw/skills/skills/chentx1243/maple-video2txt. Ensure you are using Python 3.11 or 3.12. Install the necessary dependencies via pip install -r requirements.txt. Note that the first execution will trigger a model download to the ./models directory, which may require an active network connection and sufficient disk space. Ensure ffmpeg and ffprobe are installed on your system path.
Use Cases
This skill is perfect for creators, researchers, and office professionals. Use cases include:
- Meeting Transcription: Convert recorded business meetings or video conferences into searchable text documents.
- Content Creation: Generate accurate SRT files for YouTube or social media videos to improve accessibility and SEO.
- Educational Research: Extract lecture content from video materials for note-taking and quick review.
- Archiving: Convert video archives into text logs for easy indexing and retrieval.
Example Prompts
- "Convert my meeting recording at D:\recordings\project_kickoff.mp4 into a transcript and subtitle file."
- "Use the 'small' whisper model to transcribe D:\videos\interview.mp4 and save the output in the C:\transcripts folder."
- "Transcribe my video D:\lecture.mp4 using the base model and make sure it processes on the CPU."
Tips & Limitations
- Efficiency: Always use
background: truewhen invoking the tool to prevent terminal popups from interrupting your workflow. - Monitoring: The script reports progress every 10% to prevent anxiety during long transcriptions.
- Performance: For faster results, ensure you have a compatible NVIDIA GPU and set
--device cudaand--compute-type float16if your hardware supports it. - Dependencies: This skill relies on external binary dependencies (ffmpeg/ffprobe). If transcription fails to start, verify your path settings for these tools.
- Language: While optimized for Chinese, it supports multiple languages, though performance may vary based on the chosen model size.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chentx1243-maple-video2txt": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, network-access, code-execution
Related Skills
video-to-article
从视频生成图文并排的文章(md格式)。支持本地视频文件或在线视频URL(自动下载),自动完成文本提取、视频帧截取、时间轴匹配、文章撰写全流程。
video-frame-capture
Capture key frames from video files at fixed time intervals. Use when you need to understand video content by extracting screenshots, or when you need to analyze video frames for content recognition. Supports skipping similar frames to avoid redundant captures.