ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

mlx-audio-server

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

Why use this skill?

Power your OpenClaw agents with local, high-speed Speech-To-Text and Text-To-Speech using MLX on Apple Silicon. Fast, private, and 24/7.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/guoqiao/mlx-audio-server
Or

What This Skill Does

The mlx-audio-server skill provides a robust, local-first bridge between OpenClaw and your Apple Silicon Mac's native hardware capabilities. It deploys an OpenAI-compatible API server using the MLX framework, enabling high-performance, low-latency Speech-To-Text (STT) and Text-To-Speech (TTS) operations. By leveraging the Apple Neural Engine, this skill ensures that audio processing is fast, efficient, and private, keeping all data on your local machine. It serves as a vital component for voice-enabled agents, allowing them to "hear" audio files and "speak" responses without requiring expensive or privacy-invasive cloud subscription APIs.

Installation

To integrate this capability, run the command clawhub install openclaw/skills/skills/guoqiao/mlx-audio-server in your terminal. This command pulls the necessary scripts and dependencies, including the mlx-audio-server Homebrew formula from guoqiao/tap. The installation process automatically verifies that critical utilities like ffmpeg and jq are present on your system. It also registers the server as a macOS LaunchAgent, ensuring the audio server remains active in the background, ready to process requests 24/7 without manual intervention.

Use Cases

  • Voice-to-Text Transcription: Automatically transcribe meetings, interviews, or voice memos directly on your Mac.
  • Voice-Enabled Agent Interaction: Enable OpenClaw to speak its responses aloud, creating a more human-like interface for your automation workflows.
  • Offline Media Processing: Analyze video or audio assets locally for content extraction or indexing without uploading files to third-party services.
  • Privacy-First Dictation: Use the system as a local backend for building dictation tools that never transmit sensitive audio data over the internet.

Example Prompts

  1. "Transcribe this audio file located at /Users/me/downloads/meeting.mp3 and summarize the key action items."
  2. "Convert this text: 'The automation workflow completed successfully' into an audio file and save it to my current directory."
  3. "Convert the recording at ./input_voice.wav into text so I can search for the specific mention of the budget update."

Tips & Limitations

  • Hardware Constraint: This skill is exclusively optimized for Apple Silicon (M1, M2, M3, M4 chips). It will not function on Intel-based Macs.
  • Resource Usage: MLX models can be memory-intensive. While they are highly optimized for macOS, ensure you have sufficient RAM available when processing long audio files.
  • Default Models: The server comes pre-configured with glm-asr-nano for transcription and Qwen3-TTS for speech synthesis. While these are highly efficient, you can modify the underlying scripts to swap models if your workflow requires higher accuracy or different voice characteristics.

Metadata

Author@guoqiao
Stars2387
Views6
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-guoqiao-mlx-audio-server": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#mlx#audio#stt#tts#apple-silicon
Safety Score: 4/5

Flags: file-write, file-read, code-execution