ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

mlx-whisper

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

Why use this skill?

Transcribe audio to text locally on your Mac with MLX Whisper. Fast, private, and free speech-to-text integration for OpenClaw using Apple Silicon optimized models.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/kevin37li/mlx-whisper
Or

What This Skill Does

The mlx-whisper skill provides high-performance, local speech-to-text capabilities directly on your Apple Silicon Mac. By leveraging Apple's proprietary MLX framework, this tool transforms audio and video files into high-quality text transcripts without the need for cloud-based APIs or expensive subscriptions. Since all processing occurs locally on your machine, your data remains private, and you avoid the latency associated with remote API requests. It supports a variety of Whisper models, from the ultra-fast 'tiny' variant to the highly accurate 'large-v3-turbo', allowing you to balance performance and precision based on your specific hardware and requirements.

Installation

To integrate this skill into your workflow, ensure your environment is configured for OpenClaw. Run the following command in your terminal:

clawhub install openclaw/skills/skills/kevin37li/mlx-whisper

Once installed, ensure your system has the necessary Python environment configured for Apple MLX support. Models are downloaded and cached automatically in your home directory the first time they are invoked.

Use Cases

  • Content Creation: Quickly transcribe long-form recordings, interviews, or podcasts into raw text for blog posts or documentation.
  • Accessibility: Generate SRT subtitle files for video content, ensuring your media is inclusive and searchable.
  • Research: Process academic lectures or meeting recordings without uploading sensitive data to third-party servers.
  • Translation: Utilize the built-in translation task to bridge language barriers by converting foreign-language audio into English text transcriptions.

Example Prompts

  1. "Transcribe the audio file located at /Users/me/downloads/meeting.mp3 using the large-v3-turbo model and save the result as a text file in my documents folder."
  2. "Generate SRT subtitles for /Volumes/Media/lecture.mp4 so I can add them to my video project."
  3. "Transcribe this Spanish interview file: /data/interview.m4a and translate the output to English directly."

Tips & Limitations

  • Hardware Constraint: This skill is strictly designed for Apple Silicon Macs (M1, M2, M3, M4 series). Performance will vary significantly based on your unified memory capacity.
  • Model Selection: While 'large-v3-turbo' is recommended for the best balance of speed and accuracy, use 'tiny' or 'base' if you are running on a machine with limited RAM or need instantaneous results for shorter audio clips.
  • Storage: Note that models can consume several gigabytes of disk space; check your ~/.cache/huggingface/ folder if you need to manage local storage.
  • Privacy: Because processing is local, this is the most secure way to handle sensitive or confidential audio data.

Metadata

Author@kevin37li
Stars1776
Views3
Updated2026-03-02
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-kevin37li-mlx-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#audio-processing#transcription#mlx#apple-silicon
Safety Score: 5/5

Flags: file-read, file-write