whisper-gpu-transcribe
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/allanmeng/whisper-gpu-transcriber-skillWhat This Skill Does
The whisper-gpu-transcribe skill is a robust, local-first solution for converting audio and video files into high-quality SRT subtitle files. By leveraging OpenAI's Whisper speech-to-text models, the skill runs entirely on your local hardware without sending sensitive data to external servers. It features advanced auto-detection for GPU acceleration, supporting NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU. This makes it a high-performance alternative to subscription-based subtitle services for creators and professionals.
Installation
To install this skill, use the OpenClaw command-line interface by running: clawhub install openclaw/skills/skills/allanmeng/whisper-gpu-transcriber-skill. Ensure you have Python 3.8+ installed and the appropriate PyTorch version for your specific GPU architecture. The openai-whisper dependency will be automatically resolved during the setup process. For the best performance, verify your graphics drivers are updated to the latest stable versions.
Use Cases
- Content Creation: Effortlessly generate SRT files for YouTube, TikTok, or Instagram videos, saving significant time compared to manual transcription.
- Meeting Transcription: Convert long-form audio recordings from meetings or interviews into searchable text documents.
- Educational Tools: Create study materials and transcripts for podcasts, webinars, or online courses.
- Local Privacy: Keep proprietary or sensitive audio data on your own machine without utilizing cloud-based AI APIs.
Example Prompts
- "Convert interview_recording.mp3 to SRT subtitles for me."
- "Please transcribe /home/user/downloads/meeting.wav to an SRT file using the large-v3-turbo model."
- "Convert current_lecture.mp4 to subtitles, and set the language to Japanese."
Tips & Limitations
- Download Requirements: The first time you execute the tool, it will automatically download the required model weights (up to 1.5GB). Ensure you have a stable internet connection for the initial setup.
- Caching: Models are stored in
~/.cache/whisper. If you are short on space, use a symbolic link to point this directory to a larger storage drive. - Performance: While 'large-v3' provides the highest accuracy, 'turbo' is recommended for most users as it offers the best balance between speed and quality. Users in regions with restricted access should download model files manually and place them in the cache folder to prevent timeouts.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-allanmeng-whisper-gpu-transcriber-skill": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution