ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

local-stt

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/araa47/local-stt
Or

What This Skill Does

The local-stt skill provides OpenClaw with high-performance, private, and offline speech-to-text capabilities. By leveraging ONNX Runtime with int8 quantization, it delivers efficient transcription without needing a cloud-based API connection. Users can choose between two specialized backends: Parakeet, which is optimized for high-accuracy English transcription and understands filler words, or Whisper, which offers broad support for 99 languages. The skill is designed for speed and reliability, allowing for fast processing of audio files while keeping sensitive voice data entirely on the host machine.

Installation

To install this skill, use the clawhub CLI utility. Run the following command in your terminal:

clawhub install openclaw/skills/skills/araa47/local-stt

Once installed, ensure that your environment meets the dependency requirements for ONNX Runtime. You can verify the installation by running the script directly via ~/.openclaw/skills/local-stt/scripts/local-stt.py --help to confirm that the backend switches and model configurations are accessible.

Use Cases

This skill is perfect for users who prioritize privacy and low latency. Common use cases include:

  • Transcribing private meeting audio or voice memos locally without uploading files to third-party servers.
  • Building automated audio processing pipelines where speed is critical, such as monitoring incoming voice messages in Matrix rooms.
  • Creating local documentation from recorded interviews, leveraging the high accuracy of Parakeet v2.
  • Multilingual environments where Whisper can quickly switch between different languages for transcription needs.

Example Prompts

  1. "OpenClaw, transcribe the latest audio file in my downloads folder using the Parakeet v2 engine."
  2. "Please process the voice note in the project room using Whisper base to get a quick summary."
  3. "Transcribe my last voice recording, but use the large-v3-turbo model for the best possible accuracy."

Tips & Limitations

While the local-stt skill is powerful, there are trade-offs to consider. The Parakeet engine is highly recommended for English-only tasks where accuracy is the primary goal, such as transcribing technical notes or dictated commands. If your content includes multiple languages or needs a smaller footprint, the Whisper backend is superior, specifically the 'tiny' model if system resources are constrained. Keep in mind that while int8 quantization provides excellent performance, it may marginally affect precision compared to full-precision models. Always check the --room-id flag if you intend to pipe results directly to a specific communication channel.

Metadata

Author@araa47
Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-araa47-local-stt": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#stt#speech-to-text#transcription#privacy#onnx
Safety Score: 5/5

Flags: file-read, code-execution