faster-whisper
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. Supports standard and distilled models with word-level timestamps.
Why use this skill?
Transcribe audio and video files locally with OpenClaw faster-whisper. Get 4-6x faster speeds than standard Whisper with high-accuracy, offline results.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/theplasmak/faster-whisperWhat This Skill Does
The faster-whisper skill provides high-performance, local speech-to-text capabilities within the OpenClaw ecosystem. By leveraging the CTranslate2 implementation of OpenAI's Whisper model, it achieves 4-6x faster transcription speeds compared to the original implementation while maintaining identical accuracy. Designed for efficiency, it supports GPU acceleration which enables up to 20x realtime transcription performance. This makes it an ideal tool for processing large volumes of audio or video content directly on your local machine, ensuring data privacy and removing dependency on paid cloud transcription APIs.
Installation
You can install this skill directly using the ClawHub command-line interface. Run the following command in your terminal:
clawhub install openclaw/skills/skills/theplasmak/faster-whisper
Once installed, the tool will download the necessary model files upon first execution. Ensure your system meets the requirements for CTranslate2, especially if you intend to utilize NVIDIA GPU acceleration for maximum throughput.
Use Cases
- Media Transcription: Convert lengthy podcasts, interviews, or lectures into text for research or content creation.
- Subtitling: Utilize the word-level timestamp feature to automatically generate precise subtitle files (SRT or VTT).
- Offline Processing: Perfect for environments with restricted internet access, as the model runs entirely locally.
- Batch Workflows: Efficiently transcribe hundreds of audio files in a single automated loop without incurring API costs.
Example Prompts
- "Transcribe this meeting audio file from the downloads folder and save it as a text file."
- "Can you generate subtitles for this 30-minute interview video? Make sure to include word-level timestamps."
- "Convert this lecture audio to text using the large-v3-turbo model for the best balance of speed and accuracy."
Tips & Limitations
To optimize performance, match your model choice to your hardware. If you have limited VRAM, opt for the distil-medium.en or distil-small.en models. For complex, multilingual audio, use the large-v3-turbo model. Note that this skill is optimized for file-based transcription and is not intended for real-time streaming audio or very short clips under 10 seconds. Always verify file formats are compatible with FFmpeg/CTranslate2 backends for the smoothest experience.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-theplasmak-faster-whisper": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: file-read, file-write
Related Skills
podcast-agent
Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.
harmonia
Check PyTorch, Transformers, and CUDA compatibility. Detect GPU, driver mismatches, and version conflicts in ML environments. Use when the user sets up ML/AI tools, installs torch or transformers, hits dependency errors, or asks about compatible versions.
ym-mediatoolkit
流式视频处理工具集 - 压缩、封面提取、音频转换,无需下载完整视频
youtube-summarizer
Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.
ressemble
Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.