ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

auto-whisper-safe

RAM-safe voice transcription with auto-chunking — works on 16GB machines without crashes

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/neal-collab/auto-whisper-safe
Or

Auto-Whisper Safe — RAM-Friendly Voice Transcription

Transcribe voice messages and long audio files using OpenAI Whisper without crashing your machine. Designed for 16GB RAM systems running other processes (like OpenClaw agents).

The Problem

Whisper's turbo and large models use 6-10GB RAM. On a 16GB machine running OpenClaw + Ollama + other services, this causes OOM crashes. Existing Whisper skills don't handle this.

The Solution

  1. Auto-detects audio length via ffprobe
  2. Splits long audio (>10min) into 10-min chunks automatically
  3. Uses base model by default (~1.5GB RAM — safe on any 16GB machine)
  4. Merges transcripts seamlessly — no gaps, no duplicates
  5. Cleans up temp files automatically

Usage

# Basic usage
./transcribe.sh /path/to/audio.ogg

# Custom model (if you have more RAM)
WHISPER_MODEL=small ./transcribe.sh /path/to/audio.ogg

# Custom language
WHISPER_LANG=en ./transcribe.sh /path/to/audio.ogg

# Custom output directory
./transcribe.sh /path/to/audio.ogg /path/to/output/

RAM Usage by Model

ModelRAMSpeedAccuracyRecommended For
tiny~1GB⚡⚡⚡★★Quick previews, low-RAM systems
base~1.5GB⚡⚡★★★Default — best balance
small~2.5GB★★★★When accuracy matters more
medium~5GB🐢★★★★★32GB+ RAM only
turbo~6GB🐢🐢★★★★★Dedicated transcription machines

OpenClaw Integration

Add to your agent's BOOTSTRAP.md:

## Voice Message Handling

When you receive `<media:audio>`, ALWAYS transcribe first:

1. Run: `./skills/auto-whisper-safe/transcribe.sh <audio-path>`
2. Read the output transcript file
3. Respond based on the transcribed content

Do this automatically — voice messages are meant to be transcribed.

Environment Variables

VariableDefaultDescription
WHISPER_MODELbaseWhisper model size
WHISPER_LANGenAudio language (ISO code)

How Chunking Works

  • Audio ≤10min → transcribed directly (no splitting)
  • Audio >10min → split into 10-min segments via ffmpeg
  • Each segment transcribed independently
  • Transcripts concatenated in order
  • Temp files cleaned up on exit (even on errors)

Installation

# macOS
brew install openai-whisper ffmpeg

# Ubuntu/Debian
pip install openai-whisper
apt install ffmpeg

# Verify
whisper --help && ffmpeg -version

Why This Over Other Whisper Skills

  • RAM-safe: Won't crash your 16GB machine
  • Auto-chunking: Handles 1-hour podcasts without issues
  • Cleanup: No temp files left behind
  • Progress: Shows chunk-by-chunk progress
  • Configurable: Model + language via env vars
  • OpenClaw-native: Drop-in for any agent's BOOTSTRAP.md

Real-World Performance

Tested on Ubuntu 22.04, 16GB RAM, running OpenClaw (10 agents) + Ollama simultaneously:

Metadata

Stars1335
Views0
Updated2026-02-23
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-neal-collab-auto-whisper-safe": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#whisper#transcription#voice#audio#ram-safe
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.