auto-whisper-safe
RAM-safe voice transcription with auto-chunking — works on 16GB machines without crashes
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/neal-collab/auto-whisper-safeAuto-Whisper Safe — RAM-Friendly Voice Transcription
Transcribe voice messages and long audio files using OpenAI Whisper without crashing your machine. Designed for 16GB RAM systems running other processes (like OpenClaw agents).
The Problem
Whisper's turbo and large models use 6-10GB RAM. On a 16GB machine running OpenClaw + Ollama + other services, this causes OOM crashes. Existing Whisper skills don't handle this.
The Solution
- Auto-detects audio length via ffprobe
- Splits long audio (>10min) into 10-min chunks automatically
- Uses
basemodel by default (~1.5GB RAM — safe on any 16GB machine) - Merges transcripts seamlessly — no gaps, no duplicates
- Cleans up temp files automatically
Usage
# Basic usage
./transcribe.sh /path/to/audio.ogg
# Custom model (if you have more RAM)
WHISPER_MODEL=small ./transcribe.sh /path/to/audio.ogg
# Custom language
WHISPER_LANG=en ./transcribe.sh /path/to/audio.ogg
# Custom output directory
./transcribe.sh /path/to/audio.ogg /path/to/output/
RAM Usage by Model
| Model | RAM | Speed | Accuracy | Recommended For |
|---|---|---|---|---|
tiny | ~1GB | ⚡⚡⚡ | ★★ | Quick previews, low-RAM systems |
base | ~1.5GB | ⚡⚡ | ★★★ | Default — best balance ✅ |
small | ~2.5GB | ⚡ | ★★★★ | When accuracy matters more |
medium | ~5GB | 🐢 | ★★★★★ | 32GB+ RAM only |
turbo | ~6GB | 🐢🐢 | ★★★★★ | Dedicated transcription machines |
OpenClaw Integration
Add to your agent's BOOTSTRAP.md:
## Voice Message Handling
When you receive `<media:audio>`, ALWAYS transcribe first:
1. Run: `./skills/auto-whisper-safe/transcribe.sh <audio-path>`
2. Read the output transcript file
3. Respond based on the transcribed content
Do this automatically — voice messages are meant to be transcribed.
Environment Variables
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL | base | Whisper model size |
WHISPER_LANG | en | Audio language (ISO code) |
How Chunking Works
- Audio ≤10min → transcribed directly (no splitting)
- Audio >10min → split into 10-min segments via ffmpeg
- Each segment transcribed independently
- Transcripts concatenated in order
- Temp files cleaned up on exit (even on errors)
Installation
# macOS
brew install openai-whisper ffmpeg
# Ubuntu/Debian
pip install openai-whisper
apt install ffmpeg
# Verify
whisper --help && ffmpeg -version
Why This Over Other Whisper Skills
- ✅ RAM-safe: Won't crash your 16GB machine
- ✅ Auto-chunking: Handles 1-hour podcasts without issues
- ✅ Cleanup: No temp files left behind
- ✅ Progress: Shows chunk-by-chunk progress
- ✅ Configurable: Model + language via env vars
- ✅ OpenClaw-native: Drop-in for any agent's BOOTSTRAP.md
Real-World Performance
Tested on Ubuntu 22.04, 16GB RAM, running OpenClaw (10 agents) + Ollama simultaneously:
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-neal-collab-auto-whisper-safe": {
"enabled": true,
"auto_update": true
}
}
}Tags
Related Skills
podcast-agent
Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.
ym-mediatoolkit
流式视频处理工具集 - 压缩、封面提取、音频转换,无需下载完整视频
phone-calling
Make international phone calls to any country. Low per-minute rates. Pay with PayPal or UPI.
youtube-summarizer
Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.
ressemble
Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.