mlx-local-inference
Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/v1. Embedding/ASR/OCR via Python libraries (mlx-lm, mlx-vlm, mlx-audio). Works offline. Use instead of cloud APIs for privacy or low latency.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bendusy/mlx-local-inferenceWhat This Skill Does
The mlx-local-inference skill empowers your OpenClaw agent to leverage Apple Silicon's hardware acceleration for private, high-performance AI inference directly on your machine. By bypassing cloud APIs, this skill ensures your sensitive data stays local while providing low-latency execution for tasks including text generation, vision-language analysis, speech transcription, and OCR. It integrates the oMLX gateway for continuous LLM batching and utilizes 'uv' for transient Python execution of specialized libraries like mlx-lm, mlx-vlm, and mlx-audio.
Installation
To integrate this capability into your agent, use the ClawKit CLI to install the dependency from the centralized repository. Run the following command in your terminal:
clawhub install openclaw/skills/skills/bendusy/mlx-local-inference
Once installed, ensure your local models are correctly placed in your ~/models directory, as the skill expects specific weight files for Qwen and PaddleOCR configurations. Verify your oMLX service status via curl http://localhost:8000/v1/models.
Use Cases
This skill is ideal for workflows requiring data sovereignty and offline availability. Use it for:
- Private Document Analysis: Extract text from scanned PDFs or images using local OCR without uploading to a third-party server.
- Real-time Audio Transcription: Convert local meeting recordings or voice memos to text using quantized ASR models.
- Latency-Critical Agent Flows: Execute complex LLM reasoning chains locally to avoid network bottlenecks.
- High-Volume Embedding Tasks: Generate vector representations for local search and retrieval augmented generation (RAG) tasks.
Example Prompts
- "Analyze the attached invoice image using local OCR and extract the total amount and merchant name."
- "Transcribe the file 'meeting_notes.wav' located in my downloads folder using the local ASR engine."
- "Summarize this private legal document using the Qwen3.5-35B local model, ensuring the data never leaves my Mac."
Tips & Limitations
- Performance: Always ensure your Mac is plugged into power during heavy inference, as Apple Silicon may throttle performance to preserve battery life.
- Resource Management: The oMLX stack is optimized for continuous batching, but loading large models like Qwen3.5-35B consumes significant unified memory. Close memory-intensive apps like browsers or creative suites when running large models.
- Versioning: For ASR and OCR tasks requiring
uv, always use--python 3.11to prevent potential SIGSEGV errors associated with OpenMP and Python's threading model on macOS. - Privacy: This is a purely offline-first tool. If you require zero data egress, ensure your firewall is configured to block unexpected outbound requests from the agent process.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bendusy-mlx-local-inference": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution