ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

whisper-mlx-local

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

Why use this skill?

Transcribe Telegram and WhatsApp voice messages for free using local MLX Whisper on your Mac. Private, fast, and no API costs.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/impkind/whisper-mlx-local
Or

What This Skill Does

The whisper-mlx-local skill provides a robust, private, and cost-free speech-to-text solution for your OpenClaw agent, specifically optimized for Apple Silicon Macs. By leveraging the MLX framework, this tool runs OpenAI's Whisper model directly on your hardware, bypassing external API dependencies like OpenAI, Groq, or AssemblyAI. This allows you to transcribe audio files from messaging platforms like Telegram or WhatsApp without incurring per-minute usage fees, while ensuring your data never leaves your device.

Installation

Getting started is straightforward. First, ensure your environment meets the requirements (macOS on Apple Silicon, Python 3.9+). Install the necessary dependencies by running 'pip3 install -r requirements.txt' in the skill directory. Once installed, start the local daemon using 'python3 scripts/daemon.py' to download the 1.5GB model. After the initialization, integrate it into your workflow by updating your '/.openclaw/openclaw.json' file to include the 'media.audio' configuration, pointing the CLI tool to your local transcription script. Finally, execute 'openclaw gateway restart' to activate the skill. For convenience, you may also load the included launch agent to ensure the daemon starts automatically on login.

Use Cases

  • Automating transcriptions for high-volume voice messaging channels on Telegram without subscription costs.
  • Maintaining strict data privacy by processing sensitive audio messages offline on your local machine.
  • Translating multi-lingual audio messages into English automatically using the provided translation flags.
  • Integrating voice-command capabilities into your local agent environment for hands-free system interaction.

Example Prompts

  1. "OpenClaw, please transcribe the latest voice note I received in the Telegram tech channel."
  2. "Can you summarize the voice message that was just sent by John in the family group?"
  3. "Transcribe this audio file and translate the output to English: [path_to_audio_file]."

Tips & Limitations

The first time you run this skill, there will be a noticeable delay (10-30 seconds) as the Whisper model loads into memory; subsequent transcriptions will be nearly instant. Because the model is approximately 1.5GB, ensure you have sufficient disk space. While extremely fast, performance can vary slightly based on the specific Apple Silicon chip (M1 vs M3). Always verify that your pathing in the 'openclaw.json' file is accurate, as incorrect paths will prevent the gateway from finding the transcription script.

Metadata

Author@impkind
Stars2287
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-impkind-whisper-mlx-local": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#whisper#mlx#transcription#privacy#audio
Safety Score: 5/5

Flags: file-read, code-execution