Official Verified media Safety 5/5

whisper-mlx-local

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

Why use this skill?

Transcribe Telegram and WhatsApp voice messages for free using local MLX Whisper on your Mac. Private, fast, and no API costs.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/impkind/whisper-mlx-local

Download Source Code (.zip)

What This Skill Does

The whisper-mlx-local skill provides a robust, private, and cost-free speech-to-text solution for your OpenClaw agent, specifically optimized for Apple Silicon Macs. By leveraging the MLX framework, this tool runs OpenAI's Whisper model directly on your hardware, bypassing external API dependencies like OpenAI, Groq, or AssemblyAI. This allows you to transcribe audio files from messaging platforms like Telegram or WhatsApp without incurring per-minute usage fees, while ensuring your data never leaves your device.

Installation

Getting started is straightforward. First, ensure your environment meets the requirements (macOS on Apple Silicon, Python 3.9+). Install the necessary dependencies by running 'pip3 install -r requirements.txt' in the skill directory. Once installed, start the local daemon using 'python3 scripts/daemon.py' to download the ~~1.5GB model. After the initialization, integrate it into your workflow by updating your '~~/.openclaw/openclaw.json' file to include the 'media.audio' configuration, pointing the CLI tool to your local transcription script. Finally, execute 'openclaw gateway restart' to activate the skill. For convenience, you may also load the included launch agent to ensure the daemon starts automatically on login.

Use Cases

Automating transcriptions for high-volume voice messaging channels on Telegram without subscription costs.
Maintaining strict data privacy by processing sensitive audio messages offline on your local machine.
Translating multi-lingual audio messages into English automatically using the provided translation flags.
Integrating voice-command capabilities into your local agent environment for hands-free system interaction.

Example Prompts

"OpenClaw, please transcribe the latest voice note I received in the Telegram tech channel."
"Can you summarize the voice message that was just sent by John in the family group?"
"Transcribe this audio file and translate the output to English: [path_to_audio_file]."

Tips & Limitations

The first time you run this skill, there will be a noticeable delay (10-30 seconds) as the Whisper model loads into memory; subsequent transcriptions will be nearly instant. Because the model is approximately 1.5GB, ensure you have sufficient disk space. While extremely fast, performance can vary slightly based on the specific Apple Silicon chip (M1 vs M3). Always verify that your pathing in the 'openclaw.json' file is accurate, as incorrect paths will prevent the gateway from finding the transcription script.

Read Full Documentation on GitHub

Metadata

Author@impkind

Stars2287

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-impkind-whisper-mlx-local": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#whisper#mlx#transcription#privacy#audio

Safety Score: 5/5

Flags: file-read, code-execution

Related Skills

vta-memory

Reward and motivation system for AI agents. Dopamine-like wanting, not just doing. Part of the AI Brain series.

impkind 2287

acc-error-memory

Error pattern tracking for AI agents. Detects corrections, escalates recurring mistakes, learns mitigations. The 'something's off' detector from the AI Brain series.

impkind 2287

amygdala-memory

Emotional processing layer for AI agents. Persistent emotional states that influence behavior and responses. Part of the AI Brain series.

impkind 2287

anterior-cingulate-memory

Conflict detection and error monitoring for AI agents. The 'something's off' detector. Part of the AI Brain series.

impkind 2287

basal-ganglia-memory

Habit formation and procedural learning for AI agents. Develop preferences and shortcuts through repetition. Part of the AI Brain series.

impkind 2287