ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

faster-whisper

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. Supports standard and distilled models with word-level timestamps.

Why use this skill?

Transcribe audio and video files locally with OpenClaw faster-whisper. Get 4-6x faster speeds than standard Whisper with high-accuracy, offline results.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/theplasmak/faster-whisper
Or

What This Skill Does

The faster-whisper skill provides high-performance, local speech-to-text capabilities within the OpenClaw ecosystem. By leveraging the CTranslate2 implementation of OpenAI's Whisper model, it achieves 4-6x faster transcription speeds compared to the original implementation while maintaining identical accuracy. Designed for efficiency, it supports GPU acceleration which enables up to 20x realtime transcription performance. This makes it an ideal tool for processing large volumes of audio or video content directly on your local machine, ensuring data privacy and removing dependency on paid cloud transcription APIs.

Installation

You can install this skill directly using the ClawHub command-line interface. Run the following command in your terminal:

clawhub install openclaw/skills/skills/theplasmak/faster-whisper

Once installed, the tool will download the necessary model files upon first execution. Ensure your system meets the requirements for CTranslate2, especially if you intend to utilize NVIDIA GPU acceleration for maximum throughput.

Use Cases

  • Media Transcription: Convert lengthy podcasts, interviews, or lectures into text for research or content creation.
  • Subtitling: Utilize the word-level timestamp feature to automatically generate precise subtitle files (SRT or VTT).
  • Offline Processing: Perfect for environments with restricted internet access, as the model runs entirely locally.
  • Batch Workflows: Efficiently transcribe hundreds of audio files in a single automated loop without incurring API costs.

Example Prompts

  1. "Transcribe this meeting audio file from the downloads folder and save it as a text file."
  2. "Can you generate subtitles for this 30-minute interview video? Make sure to include word-level timestamps."
  3. "Convert this lecture audio to text using the large-v3-turbo model for the best balance of speed and accuracy."

Tips & Limitations

To optimize performance, match your model choice to your hardware. If you have limited VRAM, opt for the distil-medium.en or distil-small.en models. For complex, multilingual audio, use the large-v3-turbo model. Note that this skill is optimized for file-based transcription and is not intended for real-time streaming audio or very short clips under 10 seconds. Always verify file formats are compatible with FFmpeg/CTranslate2 backends for the smoothest experience.

Metadata

Stars946
Views1
Updated2026-02-13
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-theplasmak-faster-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#audio#transcription#whisper#speech-to-text#ml#cuda#gpu
Safety Score: 5/5

Flags: file-read, file-write