ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

whisper-gpu-transcribe

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/allanmeng/whisper-gpu-transcriber-skill
Or

What This Skill Does

The whisper-gpu-transcribe skill is a robust, local-first solution for converting audio and video files into high-quality SRT subtitle files. By leveraging OpenAI's Whisper speech-to-text models, the skill runs entirely on your local hardware without sending sensitive data to external servers. It features advanced auto-detection for GPU acceleration, supporting NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU. This makes it a high-performance alternative to subscription-based subtitle services for creators and professionals.

Installation

To install this skill, use the OpenClaw command-line interface by running: clawhub install openclaw/skills/skills/allanmeng/whisper-gpu-transcriber-skill. Ensure you have Python 3.8+ installed and the appropriate PyTorch version for your specific GPU architecture. The openai-whisper dependency will be automatically resolved during the setup process. For the best performance, verify your graphics drivers are updated to the latest stable versions.

Use Cases

  • Content Creation: Effortlessly generate SRT files for YouTube, TikTok, or Instagram videos, saving significant time compared to manual transcription.
  • Meeting Transcription: Convert long-form audio recordings from meetings or interviews into searchable text documents.
  • Educational Tools: Create study materials and transcripts for podcasts, webinars, or online courses.
  • Local Privacy: Keep proprietary or sensitive audio data on your own machine without utilizing cloud-based AI APIs.

Example Prompts

  1. "Convert interview_recording.mp3 to SRT subtitles for me."
  2. "Please transcribe /home/user/downloads/meeting.wav to an SRT file using the large-v3-turbo model."
  3. "Convert current_lecture.mp4 to subtitles, and set the language to Japanese."

Tips & Limitations

  • Download Requirements: The first time you execute the tool, it will automatically download the required model weights (up to 1.5GB). Ensure you have a stable internet connection for the initial setup.
  • Caching: Models are stored in ~/.cache/whisper. If you are short on space, use a symbolic link to point this directory to a larger storage drive.
  • Performance: While 'large-v3' provides the highest accuracy, 'turbo' is recommended for most users as it offers the best balance between speed and quality. Users in regions with restricted access should download model files manually and place them in the cache folder to prevent timeouts.

Metadata

Author@allanmeng
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-allanmeng-whisper-gpu-transcriber-skill": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio-transcription#subtitles#gpu-accelerated#whisper#local-ai
Safety Score: 4/5

Flags: file-write, file-read, code-execution