ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

qwen3-audio

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).

Why use this skill?

Harness powerful TTS, STT, and voice cloning on your Apple Silicon Mac with Qwen3-Audio. Build custom voices and transcribe audio locally.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/darknoah/qwen3-audio
Or

What This Skill Does

Qwen3-Audio is a powerful, high-performance audio processing suite specifically engineered for Apple Silicon hardware (M1-M4). It bridges the gap between raw machine learning models and practical application, providing native Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities. Beyond standard conversion, it features advanced voice cloning, emotion-aware synthesis, and generative voice design, allowing users to synthesize speech that matches specific linguistic and stylistic requirements. It acts as an all-in-one local audio processing engine for OpenClaw.

Installation

To install this skill, use the ClawHub CLI: clawhub install openclaw/skills/skills/darknoah/qwen3-audio. Before running, ensure your environment is configured by verifying the checklist located at ./references/env-check-list.md. Ensure Python 3.10+ is installed and your system is an Apple Silicon Mac.

Use Cases

  1. Automated Transcription: Efficiently process long-form audio files or meeting recordings into text formats like SRT or TXT for accessibility or documentation.
  2. Voice Branding: Clone a specific brand voice using reference samples to ensure consistent tone across all automated customer-facing audio responses.
  3. Content Creation: Generate natural-sounding audio content for video projects, podcasts, or accessibility features by providing simple text scripts and stylistic prompts.

Example Prompts

  1. "Convert this recording of our team meeting at ./recordings/meeting_01.wav into a synchronized SRT file to help me create subtitles for the video recap."
  2. "Create a new synthetic voice for my virtual assistant that sounds like a professional, calm, and friendly customer support representative using the description: 'A soft-spoken, empathetic middle-aged professional voice.'"
  3. "Synthesize the following text into an audio file: 'Welcome to our platform, please select an option from the menu.' Use the 'Ryan' speaker preset and make it sound energetic and welcoming."

Tips & Limitations

  • Optimization: Because this skill is built for Apple Silicon, performance will be significantly faster than standard CPU-based alternatives. Use the MLX backend to its fullest by keeping your environment clean.
  • Voice Storage: Always organize your voices in the voices/ folder. Ensure ref_audio.wav and ref_text.txt are aligned for best cloning results.
  • Limitations: Currently, this tool is restricted to Apple Silicon hardware. It does not support cloud-based synthesis, ensuring your audio data stays local and private.

Metadata

Author@darknoah
Stars3376
Views0
Updated2026-03-24
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-darknoah-qwen3-audio": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#tts#stt#apple-silicon#voice-cloning
Safety Score: 4/5

Flags: file-write, file-read, code-execution