Official Verified media Safety 5/5

faster-whisper

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. Supports standard and distilled models with word-level timestamps.

Why use this skill?

Transcribe audio and video files locally with OpenClaw faster-whisper. Get 4-6x faster speeds than standard Whisper with high-accuracy, offline results.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/theplasmak/faster-whisper

Download Source Code (.zip)

What This Skill Does

The faster-whisper skill provides high-performance, local speech-to-text capabilities within the OpenClaw ecosystem. By leveraging the CTranslate2 implementation of OpenAI's Whisper model, it achieves 4-6x faster transcription speeds compared to the original implementation while maintaining identical accuracy. Designed for efficiency, it supports GPU acceleration which enables up to 20x realtime transcription performance. This makes it an ideal tool for processing large volumes of audio or video content directly on your local machine, ensuring data privacy and removing dependency on paid cloud transcription APIs.

Installation

You can install this skill directly using the ClawHub command-line interface. Run the following command in your terminal:

clawhub install openclaw/skills/skills/theplasmak/faster-whisper

Once installed, the tool will download the necessary model files upon first execution. Ensure your system meets the requirements for CTranslate2, especially if you intend to utilize NVIDIA GPU acceleration for maximum throughput.

Use Cases

Media Transcription: Convert lengthy podcasts, interviews, or lectures into text for research or content creation.
Subtitling: Utilize the word-level timestamp feature to automatically generate precise subtitle files (SRT or VTT).
Offline Processing: Perfect for environments with restricted internet access, as the model runs entirely locally.
Batch Workflows: Efficiently transcribe hundreds of audio files in a single automated loop without incurring API costs.

Example Prompts

"Transcribe this meeting audio file from the downloads folder and save it as a text file."
"Can you generate subtitles for this 30-minute interview video? Make sure to include word-level timestamps."
"Convert this lecture audio to text using the large-v3-turbo model for the best balance of speed and accuracy."

Tips & Limitations

To optimize performance, match your model choice to your hardware. If you have limited VRAM, opt for the distil-medium.en or distil-small.en models. For complex, multilingual audio, use the large-v3-turbo model. Note that this skill is optimized for file-based transcription and is not intended for real-time streaming audio or very short clips under 10 seconds. Always verify file formats are compatible with FFmpeg/CTranslate2 backends for the smoothest experience.

Read Full Documentation on GitHub

Metadata

Author@theplasmak

Stars946

Updated2026-02-13

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-theplasmak-faster-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Related Skills

podcast-agent

Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.

besty0121 4473

harmonia

Check PyTorch, Transformers, and CUDA compatibility. Detect GPU, driver mismatches, and version conflicts in ML environments. Use when the user sets up ML/AI tools, installs torch or transformers, hits dependency errors, or asks about compatible versions.

ahmed-eladl 4473

ym-mediatoolkit

流式视频处理工具集 - 压缩、封面提取、音频转换，无需下载完整视频

370299455cx-web 4473

youtube-summarizer

Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.

abe238 4473

ressemble

Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.

adriano-vr 4473