Official Verified media Safety 4/5

video2txt

将本地视频或音频文件转写为 SRT 字幕文件和 TXT 纯文本文件

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/chentx1243/maple-video2txt

Download Source Code (.zip)

What This Skill Does

The video2txt skill is a powerful local utility designed to transform video and audio files into accessible, text-based formats. Utilizing the high-performance faster-whisper library, this tool extracts speech from multimedia files and generates two primary outputs: an SRT subtitle file with precise timestamps and a clean TXT text file. It is specifically optimized for Chinese language processing, featuring automatic conversion to Simplified Chinese. The skill is designed for performance, offering adjustable parameters such as model size, beam size, and hardware acceleration (CPU/CUDA) to balance between speed and transcription accuracy.

Installation

To install this skill, run the following command in your terminal: clawhub install openclaw/skills/skills/chentx1243/maple-video2txt. Ensure you are using Python 3.11 or 3.12. Install the necessary dependencies via pip install -r requirements.txt. Note that the first execution will trigger a model download to the ./models directory, which may require an active network connection and sufficient disk space. Ensure ffmpeg and ffprobe are installed on your system path.

Use Cases

This skill is perfect for creators, researchers, and office professionals. Use cases include:

Meeting Transcription: Convert recorded business meetings or video conferences into searchable text documents.
Content Creation: Generate accurate SRT files for YouTube or social media videos to improve accessibility and SEO.
Educational Research: Extract lecture content from video materials for note-taking and quick review.
Archiving: Convert video archives into text logs for easy indexing and retrieval.

Example Prompts

"Convert my meeting recording at D:\recordings\project_kickoff.mp4 into a transcript and subtitle file."
"Use the 'small' whisper model to transcribe D:\videos\interview.mp4 and save the output in the C:\transcripts folder."
"Transcribe my video D:\lecture.mp4 using the base model and make sure it processes on the CPU."

Tips & Limitations

Efficiency: Always use background: true when invoking the tool to prevent terminal popups from interrupting your workflow.
Monitoring: The script reports progress every 10% to prevent anxiety during long transcriptions.
Performance: For faster results, ensure you have a compatible NVIDIA GPU and set --device cuda and --compute-type float16 if your hardware supports it.
Dependencies: This skill relies on external binary dependencies (ffmpeg/ffprobe). If transcription fails to start, verify your path settings for these tools.
Language: While optimized for Chinese, it supports multiple languages, though performance may vary based on the chosen model size.

Read Full Documentation on GitHub

Metadata

Author@chentx1243

Stars3840

Updated2026-04-06

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-chentx1243-maple-video2txt": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#whisper#video-to-text#subtitles#productivity

Safety Score: 4/5

Flags: file-write, file-read, network-access, code-execution

Related Skills

video-to-article

从视频生成图文并排的文章（md格式）。支持本地视频文件或在线视频URL（自动下载），自动完成文本提取、视频帧截取、时间轴匹配、文章撰写全流程。

chentx1243 3840

video-frame-capture

Capture key frames from video files at fixed time intervals. Use when you need to understand video content by extracting screenshots, or when you need to analyze video frames for content recognition. Supports skipping similar frames to avoid redundant captures.

chentx1243 3840