transcribe
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/javicasper/transcribeTranscribe
Local audio transcription using faster-whisper in Docker.
Installation
cd /path/to/skills/transcribe/scripts
chmod +x install.sh
./install.sh
This builds the Docker image whisper:local and installs the transcribe CLI.
Usage
transcribe /path/to/audio.mp3 [language]
- Default language:
es(Spanish) - Use
autofor auto-detection - Outputs plain text to stdout
Examples
transcribe /tmp/voice.ogg # Spanish (default)
transcribe /tmp/meeting.mp3 en # English
transcribe /tmp/audio.m4a auto # Auto-detect
Supported Formats
mp3, m4a, ogg, wav, webm, flac, aac
When Receiving Voice Messages
- Save the audio attachment to a temp file
- Run
transcribe <path> - Include the transcription in your response
- Clean up the temp file
Files
scripts/transcribe- CLI wrapper (bash)scripts/install.sh- Installation script (includes Dockerfile inline)
Notes
- Model:
small(fast) - edit install.sh forlarge-v3(accurate) - Fully local, no API key needed
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-javicasper-transcribe": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
Read and search Reddit posts via web scraping of old.reddit.com. Use when Clawdbot needs to browse Reddit content - read posts from subreddits, search for topics, monitor specific communities. Read-only access with no posting or comments.
sound-fx
Generate short sound effects via ElevenLabs SFX (text-to-sound). Use when you need SFX clips like applause, canned laughter, whooshes, ambience, or short stingers, and optionally convert to WhatsApp-friendly .ogg/opus.