ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 4/5

Faster Whisper Local Service

Skill by neldar

Why use this skill?

Deploy a private, offline speech-to-text service for OpenClaw using faster-whisper. Reduce latency and eliminate API costs with local voice transcription.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/neldar/faster-whisper-local-service
Or

What This Skill Does

The Faster Whisper Local Service provides a robust, high-performance, and privacy-focused speech-to-text (STT) backend for OpenClaw. By integrating the optimized faster-whisper implementation, this skill transforms your local machine into a dedicated transcription engine. It runs as a lightweight HTTP microservice on localhost, listening for audio input from various OpenClaw interfaces like WebChat, Telegram bots, or microphone stream integrations. Because the models reside locally, this service eliminates the need for expensive API calls to cloud-based transcription providers. Once the initial model weights are downloaded, the process is entirely offline, ensuring that your sensitive voice data never leaves your environment. It handles audio normalization and conversion using GStreamer to ensure compatibility across diverse audio inputs.

Installation

To install this skill, use the OpenClaw management command: clawhub install openclaw/skills/skills/neldar/faster-whisper-local-service. Once installed, execution is handled by a simple shell script. Navigate to the skill directory and run bash scripts/deploy.sh. This process will automatically set up a Python virtual environment, pull the necessary dependencies, and register the openclaw-transcribe.service as a systemd user service. You can customize the deployment by passing environment variables such as WHISPER_MODEL_SIZE or TRANSCRIBE_PORT before running the deploy script, allowing you to tailor the resource usage to your hardware capabilities.

Use Cases

This skill is ideal for users who require strict data privacy, such as legal or medical professionals transcribing sensitive consultations locally. It is also perfect for power users who want to eliminate recurring costs associated with commercial STT APIs. Furthermore, it supports developers creating voice-controlled automation workflows where latency must be minimized by keeping the transcription service within the local network, avoiding round-trips to the cloud.

Example Prompts

  1. "OpenClaw, transcribe the following voice message: [upload audio file]"
  2. "Please listen to my microphone input and summarize the meeting notes in bullet points."
  3. "Transcribe this Telegram voice clip and translate it to English after the text is generated."

Tips & Limitations

For optimal performance, select the model size that fits your RAM: tiny or base models are recommended for low-resource systems, while large-v3 provides superior accuracy for complex dialects if you have sufficient memory. Note that the first run will take time to download model weights (up to 3GB for the largest model). Ensure gst-launch-1.0 is installed, as it is a mandatory dependency for GStreamer audio processing. While the service is secure by binding only to localhost, always ensure your OS firewall settings are configured correctly to manage access to the service port.

Metadata

Author@neldar
Stars1335
Views1
Updated2026-02-23
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-neldar-faster-whisper-local-service": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#stt#transcription#whisper#offline#voice-to-text
Safety Score: 4/5

Flags: network-access, file-write, file-read, code-execution

Related Skills

webchat-voice-proxy

Voice input and microphone button for OpenClaw WebChat Control UI. Adds a mic button to chat, records audio via browser MediaRecorder, transcribes locally via faster-whisper, and injects text into the conversation. Includes HTTPS/WSS reverse proxy, TLS cert management, and gateway hook for update safety. Fully local speech-to-text, no API costs. Real-time VU meter shows voice activity. Push-to-Talk (hold to speak) and Toggle mode (click start/stop), switchable via double-click. Keyboard shortcuts: Ctrl+Space PTT, Ctrl+Shift+M continuous recording. Localized UI (English, German, Chinese built-in, extensible). Keywords: voice input, microphone, WebChat, Control UI, speech to text, STT, local transcription, MediaRecorder, HTTPS proxy, voice button, mic button, push-to-talk, PTT, keyboard shortcut, i18n, localization.

neldar 1335

webchat-voice-full-stack

One-step full-stack installer for OpenClaw WebChat voice input with local speech-to-text. Deploys faster-whisper STT backend plus HTTPS/WSS WebChat proxy with mic button in one command. Push-to-Talk (hold to speak) and Toggle mode with keyboard shortcuts (Ctrl+Space PTT, Ctrl+Shift+M continuous recording). Real-time VU meter, localized UI (English, German, Chinese), interactive language selection during install. No recurring API costs, runs fully local after initial model download (~1.5 GB). Combines faster-whisper-local-service and webchat-voice-proxy. Keywords: voice input, microphone, WebChat, speech to text, STT, local transcription, whisper, full stack, one-click, voice button, push-to-talk, PTT, keyboard shortcut, i18n.

neldar 1335