qwen3-audio
High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).
Why use this skill?
Harness powerful TTS, STT, and voice cloning on your Apple Silicon Mac with Qwen3-Audio. Build custom voices and transcribe audio locally.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/darknoah/qwen3-audioWhat This Skill Does
Qwen3-Audio is a powerful, high-performance audio processing suite specifically engineered for Apple Silicon hardware (M1-M4). It bridges the gap between raw machine learning models and practical application, providing native Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities. Beyond standard conversion, it features advanced voice cloning, emotion-aware synthesis, and generative voice design, allowing users to synthesize speech that matches specific linguistic and stylistic requirements. It acts as an all-in-one local audio processing engine for OpenClaw.
Installation
To install this skill, use the ClawHub CLI: clawhub install openclaw/skills/skills/darknoah/qwen3-audio. Before running, ensure your environment is configured by verifying the checklist located at ./references/env-check-list.md. Ensure Python 3.10+ is installed and your system is an Apple Silicon Mac.
Use Cases
- Automated Transcription: Efficiently process long-form audio files or meeting recordings into text formats like SRT or TXT for accessibility or documentation.
- Voice Branding: Clone a specific brand voice using reference samples to ensure consistent tone across all automated customer-facing audio responses.
- Content Creation: Generate natural-sounding audio content for video projects, podcasts, or accessibility features by providing simple text scripts and stylistic prompts.
Example Prompts
- "Convert this recording of our team meeting at ./recordings/meeting_01.wav into a synchronized SRT file to help me create subtitles for the video recap."
- "Create a new synthetic voice for my virtual assistant that sounds like a professional, calm, and friendly customer support representative using the description: 'A soft-spoken, empathetic middle-aged professional voice.'"
- "Synthesize the following text into an audio file: 'Welcome to our platform, please select an option from the menu.' Use the 'Ryan' speaker preset and make it sound energetic and welcoming."
Tips & Limitations
- Optimization: Because this skill is built for Apple Silicon, performance will be significantly faster than standard CPU-based alternatives. Use the MLX backend to its fullest by keeping your environment clean.
- Voice Storage: Always organize your voices in the
voices/folder. Ensureref_audio.wavandref_text.txtare aligned for best cloning results. - Limitations: Currently, this tool is restricted to Apple Silicon hardware. It does not support cloud-based synthesis, ensuring your audio data stays local and private.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-darknoah-qwen3-audio": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
free-resource
Search and retrieve royalty-free media from Pixabay (images/videos), Freesound (audio effects), and Jamendo (music/BGM). Use when the user needs to find stock photos, illustrations, vectors, videos, sound effects, or background music, download media, or query media libraries with filters.
qwen-audio
High-performance audio library with text-to-speech (TTS) and speech-to-text (STT).
Rednote Cli
Skill by darknoah
redact
Privacy redaction toolkit for images, PDFs, Word documents, and PowerPoint presentations. Use when the user needs to redact, mask, or replace sensitive/private information in files. Triggers: - Redacting or masking sensitive text in images, PDFs, documents, or presentations - Replacing names, phone numbers, IDs, or other PII in files - Processing privacy compliance for documents before sharing - Anonymizing content in visual files Supported formats: png/jpg images, PDF, docx/doc, pptx/ppt