whisper-mlx-local
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
Why use this skill?
Transcribe Telegram and WhatsApp voice messages for free using local MLX Whisper on your Mac. Private, fast, and no API costs.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/impkind/whisper-mlx-localWhat This Skill Does
The whisper-mlx-local skill provides a robust, private, and cost-free speech-to-text solution for your OpenClaw agent, specifically optimized for Apple Silicon Macs. By leveraging the MLX framework, this tool runs OpenAI's Whisper model directly on your hardware, bypassing external API dependencies like OpenAI, Groq, or AssemblyAI. This allows you to transcribe audio files from messaging platforms like Telegram or WhatsApp without incurring per-minute usage fees, while ensuring your data never leaves your device.
Installation
Getting started is straightforward. First, ensure your environment meets the requirements (macOS on Apple Silicon, Python 3.9+). Install the necessary dependencies by running 'pip3 install -r requirements.txt' in the skill directory. Once installed, start the local daemon using 'python3 scripts/daemon.py' to download the 1.5GB model. After the initialization, integrate it into your workflow by updating your '/.openclaw/openclaw.json' file to include the 'media.audio' configuration, pointing the CLI tool to your local transcription script. Finally, execute 'openclaw gateway restart' to activate the skill. For convenience, you may also load the included launch agent to ensure the daemon starts automatically on login.
Use Cases
- Automating transcriptions for high-volume voice messaging channels on Telegram without subscription costs.
- Maintaining strict data privacy by processing sensitive audio messages offline on your local machine.
- Translating multi-lingual audio messages into English automatically using the provided translation flags.
- Integrating voice-command capabilities into your local agent environment for hands-free system interaction.
Example Prompts
- "OpenClaw, please transcribe the latest voice note I received in the Telegram tech channel."
- "Can you summarize the voice message that was just sent by John in the family group?"
- "Transcribe this audio file and translate the output to English: [path_to_audio_file]."
Tips & Limitations
The first time you run this skill, there will be a noticeable delay (10-30 seconds) as the Whisper model loads into memory; subsequent transcriptions will be nearly instant. Because the model is approximately 1.5GB, ensure you have sufficient disk space. While extremely fast, performance can vary slightly based on the specific Apple Silicon chip (M1 vs M3). Always verify that your pathing in the 'openclaw.json' file is accurate, as incorrect paths will prevent the gateway from finding the transcription script.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-impkind-whisper-mlx-local": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution
Related Skills
vta-memory
Reward and motivation system for AI agents. Dopamine-like wanting, not just doing. Part of the AI Brain series.
acc-error-memory
Error pattern tracking for AI agents. Detects corrections, escalates recurring mistakes, learns mitigations. The 'something's off' detector from the AI Brain series.
amygdala-memory
Emotional processing layer for AI agents. Persistent emotional states that influence behavior and responses. Part of the AI Brain series.
anterior-cingulate-memory
Conflict detection and error monitoring for AI agents. The 'something's off' detector. Part of the AI Brain series.
basal-ganglia-memory
Habit formation and procedural learning for AI agents. Develop preferences and shortcuts through repetition. Part of the AI Brain series.