ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 4/5

local-voice

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

Why use this skill?

Enable high-speed, local TTS and STT on your Mac using the local-voice skill. Privacy-first, offline-capable voice AI powered by the Apple Neural Engine.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/trondw/local-voice
Or

What This Skill Does

The local-voice skill provides a high-performance, privacy-focused voice interface for OpenClaw on Apple Silicon hardware. Utilizing FluidAudio's CoreML models, it bridges the gap between AI processing and human interaction by offering sub-second latency for both Text-to-Speech (TTS) and Speech-to-Text (STT). By tapping directly into the Apple Neural Engine, the skill handles complex voice synthesis and transcription tasks entirely on your local machine, eliminating reliance on external cloud APIs, reducing privacy risks, and ensuring that voice services remain operational even without an internet connection.

Installation

Installation requires a macOS environment running version 14 or higher on Apple Silicon (M1-M4). First, ensure you have the necessary system dependencies by running brew install espeak-ng. Once prerequisites are met, navigate to your source directory to compile the daemon with swift build -c release. After building, install the binary and framework into your local environment using the provided installation script, which sets up the necessary runtime paths. Finally, register the daemon with the system by creating and loading a standard macOS LaunchAgent, which ensures the local-voice service starts automatically when you log in.

Use Cases

This skill is perfect for users looking to replace costly cloud-based TTS/STT services with a zero-cost local alternative. It is ideal for building low-latency voice assistants that interact with OpenClaw, enabling hands-free system control, or transcribing audio files locally for security-sensitive workflows where data must not leave the device.

Example Prompts

  1. "OpenClaw, transcribe the file audio.wav using the local-voice daemon and save the output to my documents."
  2. "Use the local-voice synthesizer to read the contents of this text file using the af_heart voice profile."
  3. "Please initialize the speech-to-text service and begin listening for commands for the next 60 seconds."

Tips & Limitations

To optimize performance, experiment with the speed parameter to match your desired tone; 1.0 is generally the most natural, while 0.8 works well for calming, meditative applications. Ensure your audio input hardware is properly calibrated to get the most out of the Parakeet TDT v3 model. Note that this skill is specifically optimized for Apple Silicon; while it runs efficiently, intensive concurrent tasks may impact the neural engine's available headroom. Always verify your voice selection from the provided documentation to ensure the chosen profile suits your specific TTS task requirements.

Metadata

Author@trondw
Stars946
Views1
Updated2026-02-13
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-trondw-local-voice": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#stt#voice-ai#apple-silicon#local-ai
Safety Score: 4/5

Flags: network-access, file-write, file-read, code-execution