mlx-whisper
Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).
Why use this skill?
Transcribe audio to text locally on your Mac with MLX Whisper. Fast, private, and free speech-to-text integration for OpenClaw using Apple Silicon optimized models.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/kevin37li/mlx-whisperWhat This Skill Does
The mlx-whisper skill provides high-performance, local speech-to-text capabilities directly on your Apple Silicon Mac. By leveraging Apple's proprietary MLX framework, this tool transforms audio and video files into high-quality text transcripts without the need for cloud-based APIs or expensive subscriptions. Since all processing occurs locally on your machine, your data remains private, and you avoid the latency associated with remote API requests. It supports a variety of Whisper models, from the ultra-fast 'tiny' variant to the highly accurate 'large-v3-turbo', allowing you to balance performance and precision based on your specific hardware and requirements.
Installation
To integrate this skill into your workflow, ensure your environment is configured for OpenClaw. Run the following command in your terminal:
clawhub install openclaw/skills/skills/kevin37li/mlx-whisper
Once installed, ensure your system has the necessary Python environment configured for Apple MLX support. Models are downloaded and cached automatically in your home directory the first time they are invoked.
Use Cases
- Content Creation: Quickly transcribe long-form recordings, interviews, or podcasts into raw text for blog posts or documentation.
- Accessibility: Generate SRT subtitle files for video content, ensuring your media is inclusive and searchable.
- Research: Process academic lectures or meeting recordings without uploading sensitive data to third-party servers.
- Translation: Utilize the built-in translation task to bridge language barriers by converting foreign-language audio into English text transcriptions.
Example Prompts
- "Transcribe the audio file located at /Users/me/downloads/meeting.mp3 using the large-v3-turbo model and save the result as a text file in my documents folder."
- "Generate SRT subtitles for /Volumes/Media/lecture.mp4 so I can add them to my video project."
- "Transcribe this Spanish interview file: /data/interview.m4a and translate the output to English directly."
Tips & Limitations
- Hardware Constraint: This skill is strictly designed for Apple Silicon Macs (M1, M2, M3, M4 series). Performance will vary significantly based on your unified memory capacity.
- Model Selection: While 'large-v3-turbo' is recommended for the best balance of speed and accuracy, use 'tiny' or 'base' if you are running on a machine with limited RAM or need instantaneous results for shorter audio clips.
- Storage: Note that models can consume several gigabytes of disk space; check your
~/.cache/huggingface/folder if you need to manage local storage. - Privacy: Because processing is local, this is the most secure way to handle sensitive or confidential audio data.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-kevin37li-mlx-whisper": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write
Related Skills
gettr-transcribe-summarize
Download audio from a GETTR post (via HTML og:video), transcribe it locally with MLX Whisper on Apple Silicon (with timestamps via VTT), and summarize the transcript into bullet points and/or a timestamped outline. Use when given a GETTR post URL and asked to produce a transcript or summary.
gettr-transcribe
Download audio from a GETTR post or streaming page and transcribe it locally with MLX Whisper on Apple Silicon (with timestamps via VTT). Use when given a GETTR URL and asked to produce a transcript. Summarization is handled by the caller.