Official Verified

auto-whisper-safe

RAM-safe voice transcription with auto-chunking — works on 16GB machines without crashes

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/neal-collab/auto-whisper-safe

Download Source Code (.zip)

Auto-Whisper Safe — RAM-Friendly Voice Transcription

Transcribe voice messages and long audio files using OpenAI Whisper without crashing your machine. Designed for 16GB RAM systems running other processes (like OpenClaw agents).

The Problem

Whisper's turbo and large models use 6-10GB RAM. On a 16GB machine running OpenClaw + Ollama + other services, this causes OOM crashes. Existing Whisper skills don't handle this.

The Solution

Auto-detects audio length via ffprobe
Splits long audio (>10min) into 10-min chunks automatically
Uses base model by default (~1.5GB RAM — safe on any 16GB machine)
Merges transcripts seamlessly — no gaps, no duplicates
Cleans up temp files automatically

Usage

# Basic usage
./transcribe.sh /path/to/audio.ogg

# Custom model (if you have more RAM)
WHISPER_MODEL=small ./transcribe.sh /path/to/audio.ogg

# Custom language
WHISPER_LANG=en ./transcribe.sh /path/to/audio.ogg

# Custom output directory
./transcribe.sh /path/to/audio.ogg /path/to/output/

RAM Usage by Model

Model	RAM	Speed	Accuracy	Recommended For
`tiny`	~1GB	⚡⚡⚡	★★	Quick previews, low-RAM systems
`base`	~1.5GB	⚡⚡	★★★	Default — best balance ✅
`small`	~2.5GB	⚡	★★★★	When accuracy matters more
`medium`	~5GB	🐢	★★★★★	32GB+ RAM only
`turbo`	~6GB	🐢🐢	★★★★★	Dedicated transcription machines

OpenClaw Integration

Add to your agent's BOOTSTRAP.md:

## Voice Message Handling

When you receive `<media:audio>`, ALWAYS transcribe first:

1. Run: `./skills/auto-whisper-safe/transcribe.sh <audio-path>`
2. Read the output transcript file
3. Respond based on the transcribed content

Do this automatically — voice messages are meant to be transcribed.

Environment Variables

Variable	Default	Description
`WHISPER_MODEL`	`base`	Whisper model size
`WHISPER_LANG`	`en`	Audio language (ISO code)

How Chunking Works

Audio ≤10min → transcribed directly (no splitting)
Audio >10min → split into 10-min segments via ffmpeg
Each segment transcribed independently
Transcripts concatenated in order
Temp files cleaned up on exit (even on errors)

Installation

# macOS
brew install openai-whisper ffmpeg

# Ubuntu/Debian
pip install openai-whisper
apt install ffmpeg

# Verify
whisper --help && ffmpeg -version

Why This Over Other Whisper Skills

✅ RAM-safe: Won't crash your 16GB machine
✅ Auto-chunking: Handles 1-hour podcasts without issues
✅ Cleanup: No temp files left behind
✅ Progress: Shows chunk-by-chunk progress
✅ Configurable: Model + language via env vars
✅ OpenClaw-native: Drop-in for any agent's BOOTSTRAP.md

Real-World Performance

Tested on Ubuntu 22.04, 16GB RAM, running OpenClaw (10 agents) + Ollama simultaneously:

Read Full Documentation on GitHub

Metadata

Author@neal-collab

Stars1335

Updated2026-02-23

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-neal-collab-auto-whisper-safe": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Related Skills

podcast-agent

Search articles on any topic, generate a two-host dialogue script, and synthesize podcast audio via TTS. Turn long reads into listenable content.

besty0121 4473

ym-mediatoolkit

流式视频处理工具集 - 压缩、封面提取、音频转换，无需下载完整视频

370299455cx-web 4473

phone-calling

Make international phone calls to any country. Low per-minute rates. Pay with PayPal or UPI.

adisahani 4473

youtube-summarizer

Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.

abe238 4473

ressemble

Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.

adriano-vr 4473