What This Skill Does

The local-voice skill provides a high-performance, privacy-focused voice interface for OpenClaw on Apple Silicon hardware. Utilizing FluidAudio's CoreML models, it bridges the gap between AI processing and human interaction by offering sub-second latency for both Text-to-Speech (TTS) and Speech-to-Text (STT). By tapping directly into the Apple Neural Engine, the skill handles complex voice synthesis and transcription tasks entirely on your local machine, eliminating reliance on external cloud APIs, reducing privacy risks, and ensuring that voice services remain operational even without an internet connection.

Installation

Installation requires a macOS environment running version 14 or higher on Apple Silicon (M1-M4). First, ensure you have the necessary system dependencies by running brew install espeak-ng. Once prerequisites are met, navigate to your source directory to compile the daemon with swift build -c release. After building, install the binary and framework into your local environment using the provided installation script, which sets up the necessary runtime paths. Finally, register the daemon with the system by creating and loading a standard macOS LaunchAgent, which ensures the local-voice service starts automatically when you log in.

Use Cases

This skill is perfect for users looking to replace costly cloud-based TTS/STT services with a zero-cost local alternative. It is ideal for building low-latency voice assistants that interact with OpenClaw, enabling hands-free system control, or transcribing audio files locally for security-sensitive workflows where data must not leave the device.

Example Prompts

"OpenClaw, transcribe the file audio.wav using the local-voice daemon and save the output to my documents."
"Use the local-voice synthesizer to read the contents of this text file using the af_heart voice profile."
"Please initialize the speech-to-text service and begin listening for commands for the next 60 seconds."

Tips & Limitations

To optimize performance, experiment with the speed parameter to match your desired tone; 1.0 is generally the most natural, while 0.8 works well for calming, meditative applications. Ensure your audio input hardware is properly calibrated to get the most out of the Parakeet TDT v3 model. Note that this skill is specifically optimized for Apple Silicon; while it runs efficiently, intensive concurrent tasks may impact the neural engine's available headroom. Always verify your voice selection from the provided documentation to ensure the chosen profile suits your specific TTS task requirements.

local-voice

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)