Official Verified media Safety 5/5

local-whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/araa47/local-whisper

Download Source Code (.zip)

What This Skill Does

The local-whisper skill provides a robust, high-performance speech-to-text (STT) interface for the OpenClaw AI agent, powered by OpenAI's Whisper architecture. Unlike cloud-based transcription services that may impose latency or privacy risks by sending sensitive audio data to external servers, this skill executes entirely offline. Once the model weights are downloaded to your system, the skill remains fully functional without an internet connection. It is designed to process audio files like .wav, .mp3, or .m4a and convert them into highly accurate text transcripts. Users can leverage a variety of model sizes, ranging from the lightweight 'tiny' model for rapid, low-resource environments, to the 'large-v3' model for professional-grade transcription accuracy. This makes it an ideal tool for users who prioritize data sovereignty and local computation.

Installation

To install the skill, execute the following command in your terminal: clawhub install openclaw/skills/skills/araa47/local-whisper. The skill manages its own Python environment using uv, ensuring dependency isolation. The setup process creates a .venv directory within the skill folder, installing necessary libraries like click, openai-whisper, and torch. If you need to perform a fresh install or update dependencies, navigate to ~/.clawdbot/skills/local-whisper and run the uv pip install command provided in the technical documentation to ensure the PyTorch CPU-optimized wheel is correctly linked to your Python 3.12 environment.

Use Cases

Journaling: Automatically transcribe voice-recorded thoughts into text files for your personal database.
Meeting Summarization: Process long audio recordings of meetings or interviews to get a searchable text history.
Accessibility: Convert voice inputs into text to facilitate better interaction with command-line tools and scripts.
Archiving: Digitize old voice memos or analog recordings into structured text logs with precise timestamps.

Example Prompts

"Transcribe the file meeting_notes.wav using the turbo model for the best balance of speed and accuracy."
"Convert the audio recording from my interview into a JSON file, including timestamps for every word detected."
"Process the audio file 'daily_log.mp3' and save the output text into my current project directory."

Tips & Limitations

For optimal performance, ensure your CPU has sufficient headroom. While 'tiny' and 'base' models run comfortably on modest hardware, the 'large-v3' model requires significant RAM and CPU cycles to function smoothly. Because this tool runs locally, it relies entirely on your hardware capabilities rather than cloud server clusters. If you encounter slow transcription speeds, try switching to the 'small' or 'base' model. Additionally, be aware that while the transcription is accurate, technical jargon or heavily accented speech may yield varying results. Always verify the output if your use case involves critical or sensitive information to ensure high-fidelity transcription quality.

Read Full Documentation on GitHub

Metadata

Author@araa47

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-araa47-local-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#whisper#offline#transcription#audio-processing

Safety Score: 5/5

Flags: file-read, file-write, code-execution

Related Skills

ez-unifi

Use when asked to manage UniFi network - list/restart/upgrade devices, block/unblock clients, manage WiFi networks, control PoE ports, manage traffic rules, create guest vouchers, or any UniFi controller task. Works with UDM Pro/SE, Dream Machine, Cloud Key Gen2+, or self-hosted controllers.

araa47 4473

ez-google

Use when asked to send email, check inbox, read emails, check calendar, schedule meetings, create events, search Google Drive, create Google Docs, read or write spreadsheets, find contacts, or any task involving Gmail, Google Calendar, Drive, Docs, Sheets, Slides, or Contacts. Agent-friendly with hosted OAuth - no API keys needed.

araa47 4473

gemini-stt

Transcribe audio files using Google's Gemini API or Vertex AI

araa47 4473

local-stt

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

araa47 4473

md-to-pdf

Convert markdown files to clean, formatted PDFs using reportlab

araa47 4473