ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

local-whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/araa47/local-whisper
Or

What This Skill Does

The local-whisper skill provides a robust, high-performance speech-to-text (STT) interface for the OpenClaw AI agent, powered by OpenAI's Whisper architecture. Unlike cloud-based transcription services that may impose latency or privacy risks by sending sensitive audio data to external servers, this skill executes entirely offline. Once the model weights are downloaded to your system, the skill remains fully functional without an internet connection. It is designed to process audio files like .wav, .mp3, or .m4a and convert them into highly accurate text transcripts. Users can leverage a variety of model sizes, ranging from the lightweight 'tiny' model for rapid, low-resource environments, to the 'large-v3' model for professional-grade transcription accuracy. This makes it an ideal tool for users who prioritize data sovereignty and local computation.

Installation

To install the skill, execute the following command in your terminal: clawhub install openclaw/skills/skills/araa47/local-whisper. The skill manages its own Python environment using uv, ensuring dependency isolation. The setup process creates a .venv directory within the skill folder, installing necessary libraries like click, openai-whisper, and torch. If you need to perform a fresh install or update dependencies, navigate to ~/.clawdbot/skills/local-whisper and run the uv pip install command provided in the technical documentation to ensure the PyTorch CPU-optimized wheel is correctly linked to your Python 3.12 environment.

Use Cases

  • Journaling: Automatically transcribe voice-recorded thoughts into text files for your personal database.
  • Meeting Summarization: Process long audio recordings of meetings or interviews to get a searchable text history.
  • Accessibility: Convert voice inputs into text to facilitate better interaction with command-line tools and scripts.
  • Archiving: Digitize old voice memos or analog recordings into structured text logs with precise timestamps.

Example Prompts

  • "Transcribe the file meeting_notes.wav using the turbo model for the best balance of speed and accuracy."
  • "Convert the audio recording from my interview into a JSON file, including timestamps for every word detected."
  • "Process the audio file 'daily_log.mp3' and save the output text into my current project directory."

Tips & Limitations

For optimal performance, ensure your CPU has sufficient headroom. While 'tiny' and 'base' models run comfortably on modest hardware, the 'large-v3' model requires significant RAM and CPU cycles to function smoothly. Because this tool runs locally, it relies entirely on your hardware capabilities rather than cloud server clusters. If you encounter slow transcription speeds, try switching to the 'small' or 'base' model. Additionally, be aware that while the transcription is accurate, technical jargon or heavily accented speech may yield varying results. Always verify the output if your use case involves critical or sensitive information to ensure high-fidelity transcription quality.

Metadata

Author@araa47
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-araa47-local-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#whisper#offline#transcription#audio-processing
Safety Score: 5/5

Flags: file-read, file-write, code-execution