ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified ai models Safety 4/5

willow-inference-server

Local ASR and TTS inference server. Use when the user wants to transcribe audio to text (ASR) or convert text to speech (TTS). Requires a running Willow Inference Server instance. Supports Whisper for ASR and custom TTS voices.

Why use this skill?

Integrate local speech-to-text and text-to-speech capabilities into OpenClaw. Transcribe audio and generate voice responses using Willow.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/deantiwang/willow-inference-server
Or

What This Skill Does

The Willow Inference Server skill serves as a bridge between the OpenClaw agent and the robust Willow Inference Server. It enables local, privacy-focused speech-to-text (ASR) and text-to-speech (TTS) capabilities. By utilizing Whisper for transcription and various synthetic voice models for speech, this skill allows the agent to process audio inputs, summarize meetings, record memos, and provide spoken responses with customizable parameters like speed and volume. It is designed for users who require high-performance, self-hosted voice interaction without relying on external cloud services.

Installation

To install this skill, run the command: clawhub install openclaw/skills/skills/deantiwang/willow-inference-server. Before using the skill, you must ensure you have a running Willow Inference Server instance. Follow the provided setup guide in the repository to clone, configure, and initialize the server using the setup scripts. Once running, ensure the WILLOW_BASE_URL environment variable is correctly configured in your shell or the OpenClaw configuration file to point to your server instance (e.g., https://your-hostname:19000).

Use Cases

This skill is perfect for scenarios involving:

  1. Transcription: Automatically transcribe recordings, meetings, or voice memos into text for documentation purposes.
  2. Audio Feedback: Providing verbal responses from the agent for better accessibility or hands-free operation.
  3. Voice Personalization: Using specific voice profiles like 'af_sarah' or 'am_michael' for specific brand personas.
  4. Content Creation: Converting text drafts into audio files for podcasts or voiceovers.

Example Prompts

  • "Transcribe this audio file for me: /path/to/meeting.mp3 and summarize the key action items."
  • "Speak the following text using the am_michael voice at 1.1 speed: 'The system update is complete.'"
  • "Listen to my latest audio recording and extract the main points into a markdown file."

Tips & Limitations

Ensure your network connectivity is stable if hosting the server on a separate machine. For optimal transcription performance, use clean audio sources and specify the language code if known, rather than relying on 'auto'. Note that TTS quality depends heavily on the model loaded on the server side. Always verify your server certificate setup, as Willow Inference Server typically enforces HTTPS. This skill does not store your audio logs; it acts as a direct pass-through, so ensure your local server instance has appropriate storage management for the generated audio output files.

Metadata

Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-deantiwang-willow-inference-server": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech#audio#asr#tts#privacy
Safety Score: 4/5

Flags: network-access, file-write, file-read