What This Skill Does

The whisper-gpu-transcribe skill is a robust, local-first solution for converting audio and video files into high-quality SRT subtitle files. By leveraging OpenAI's Whisper speech-to-text models, the skill runs entirely on your local hardware without sending sensitive data to external servers. It features advanced auto-detection for GPU acceleration, supporting NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU. This makes it a high-performance alternative to subscription-based subtitle services for creators and professionals.

Installation

To install this skill, use the OpenClaw command-line interface by running: clawhub install openclaw/skills/skills/allanmeng/whisper-gpu-transcriber-skill. Ensure you have Python 3.8+ installed and the appropriate PyTorch version for your specific GPU architecture. The openai-whisper dependency will be automatically resolved during the setup process. For the best performance, verify your graphics drivers are updated to the latest stable versions.

Use Cases

Content Creation: Effortlessly generate SRT files for YouTube, TikTok, or Instagram videos, saving significant time compared to manual transcription.
Meeting Transcription: Convert long-form audio recordings from meetings or interviews into searchable text documents.
Educational Tools: Create study materials and transcripts for podcasts, webinars, or online courses.
Local Privacy: Keep proprietary or sensitive audio data on your own machine without utilizing cloud-based AI APIs.

Example Prompts

"Convert interview_recording.mp3 to SRT subtitles for me."
"Please transcribe /home/user/downloads/meeting.wav to an SRT file using the large-v3-turbo model."
"Convert current_lecture.mp4 to subtitles, and set the language to Japanese."

Tips & Limitations

Download Requirements: The first time you execute the tool, it will automatically download the required model weights (up to 1.5GB). Ensure you have a stable internet connection for the initial setup.
Caching: Models are stored in ~/.cache/whisper. If you are short on space, use a symbolic link to point this directory to a larger storage drive.
Performance: While 'large-v3' provides the highest accuracy, 'turbo' is recommended for most users as it offers the best balance between speed and quality. Users in regions with restricted access should download model files manually and place them in the cache folder to prevent timeouts.

whisper-gpu-transcribe

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)