willow-inference-server
Local ASR and TTS inference server. Use when the user wants to transcribe audio to text (ASR) or convert text to speech (TTS). Requires a running Willow Inference Server instance. Supports Whisper for ASR and custom TTS voices.
Why use this skill?
Integrate local speech-to-text and text-to-speech capabilities into OpenClaw. Transcribe audio and generate voice responses using Willow.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/deantiwang/willow-inference-serverWhat This Skill Does
The Willow Inference Server skill serves as a bridge between the OpenClaw agent and the robust Willow Inference Server. It enables local, privacy-focused speech-to-text (ASR) and text-to-speech (TTS) capabilities. By utilizing Whisper for transcription and various synthetic voice models for speech, this skill allows the agent to process audio inputs, summarize meetings, record memos, and provide spoken responses with customizable parameters like speed and volume. It is designed for users who require high-performance, self-hosted voice interaction without relying on external cloud services.
Installation
To install this skill, run the command: clawhub install openclaw/skills/skills/deantiwang/willow-inference-server. Before using the skill, you must ensure you have a running Willow Inference Server instance. Follow the provided setup guide in the repository to clone, configure, and initialize the server using the setup scripts. Once running, ensure the WILLOW_BASE_URL environment variable is correctly configured in your shell or the OpenClaw configuration file to point to your server instance (e.g., https://your-hostname:19000).
Use Cases
This skill is perfect for scenarios involving:
- Transcription: Automatically transcribe recordings, meetings, or voice memos into text for documentation purposes.
- Audio Feedback: Providing verbal responses from the agent for better accessibility or hands-free operation.
- Voice Personalization: Using specific voice profiles like 'af_sarah' or 'am_michael' for specific brand personas.
- Content Creation: Converting text drafts into audio files for podcasts or voiceovers.
Example Prompts
- "Transcribe this audio file for me: /path/to/meeting.mp3 and summarize the key action items."
- "Speak the following text using the am_michael voice at 1.1 speed: 'The system update is complete.'"
- "Listen to my latest audio recording and extract the main points into a markdown file."
Tips & Limitations
Ensure your network connectivity is stable if hosting the server on a separate machine. For optimal transcription performance, use clean audio sources and specify the language code if known, rather than relying on 'auto'. Note that TTS quality depends heavily on the model loaded on the server side. Always verify your server certificate setup, as Willow Inference Server typically enforces HTTPS. This skill does not store your audio logs; it acts as a direct pass-through, so ensure your local server instance has appropriate storage management for the generated audio output files.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-deantiwang-willow-inference-server": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read
Related Skills
guitar-chord
Guitar chord toolkit with chord identification, chord diagrams, capo calculation, and more. Features: - Identify chord from notes (reverse lookup) - Look up notes and diagrams from chord name - View chord inversions - View scales and scale diagrams - Capo transposition calculator
bark-push
Send push notifications to iOS devices via Bark. Use when you need to send a push notification to user's iPhone. Triggered by phrases like "send a notification", "push to phone", "bark notify", or when explicitly asked to send a push.