Official Verified media Safety 4/5

mlx-audio-server

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

Why use this skill?

Power your OpenClaw agents with local, high-speed Speech-To-Text and Text-To-Speech using MLX on Apple Silicon. Fast, private, and 24/7.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/guoqiao/mlx-audio-server

Download Source Code (.zip)

What This Skill Does

The mlx-audio-server skill provides a robust, local-first bridge between OpenClaw and your Apple Silicon Mac's native hardware capabilities. It deploys an OpenAI-compatible API server using the MLX framework, enabling high-performance, low-latency Speech-To-Text (STT) and Text-To-Speech (TTS) operations. By leveraging the Apple Neural Engine, this skill ensures that audio processing is fast, efficient, and private, keeping all data on your local machine. It serves as a vital component for voice-enabled agents, allowing them to "hear" audio files and "speak" responses without requiring expensive or privacy-invasive cloud subscription APIs.

Installation

To integrate this capability, run the command clawhub install openclaw/skills/skills/guoqiao/mlx-audio-server in your terminal. This command pulls the necessary scripts and dependencies, including the mlx-audio-server Homebrew formula from guoqiao/tap. The installation process automatically verifies that critical utilities like ffmpeg and jq are present on your system. It also registers the server as a macOS LaunchAgent, ensuring the audio server remains active in the background, ready to process requests 24/7 without manual intervention.

Use Cases

Voice-to-Text Transcription: Automatically transcribe meetings, interviews, or voice memos directly on your Mac.
Voice-Enabled Agent Interaction: Enable OpenClaw to speak its responses aloud, creating a more human-like interface for your automation workflows.
Offline Media Processing: Analyze video or audio assets locally for content extraction or indexing without uploading files to third-party services.
Privacy-First Dictation: Use the system as a local backend for building dictation tools that never transmit sensitive audio data over the internet.

Example Prompts

"Transcribe this audio file located at /Users/me/downloads/meeting.mp3 and summarize the key action items."
"Convert this text: 'The automation workflow completed successfully' into an audio file and save it to my current directory."
"Convert the recording at ./input_voice.wav into text so I can search for the specific mention of the budget update."

Tips & Limitations

Hardware Constraint: This skill is exclusively optimized for Apple Silicon (M1, M2, M3, M4 chips). It will not function on Intel-based Macs.
Resource Usage: MLX models can be memory-intensive. While they are highly optimized for macOS, ensure you have sufficient RAM available when processing long audio files.
Default Models: The server comes pre-configured with glm-asr-nano for transcription and Qwen3-TTS for speech synthesis. While these are highly efficient, you can modify the underlying scripts to swap models if your workflow requires higher accuracy or different voice characteristics.

Read Full Documentation on GitHub

Metadata

Author@guoqiao

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-guoqiao-mlx-audio-server": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#mlx#audio#stt#tts#apple-silicon

Safety Score: 4/5

Flags: file-write, file-read, code-execution

Related Skills

mlx-stt

Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.

guoqiao 2387

dl

Download Video/Music from YouTube/Bilibili/X/etc.

guoqiao 2387

url2pdf

Convert URL to PDF suitable for mobile reading.

guoqiao 2387

uv-global

Provision and reuse a global uv environment for ad hoc Python scripts.

guoqiao 2387

url2png

Convert URL to PNG suitable for mobile reading.

guoqiao 2387