Official Verified media Safety 4/5

qwen-audio

High-performance audio library with text-to-speech (TTS) and speech-to-text (STT).

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/darknoah/qwen-audio

Download Source Code (.zip)

What This Skill Does

Qwen-Audio is a powerful, high-performance library integrated into the OpenClaw ecosystem, specifically designed to bridge the gap between text and speech. It provides robust capabilities for both Text-to-Speech (TTS) and Speech-to-Text (STT) processing. At its core, the skill empowers users to create highly customized, reusable voice profiles using sophisticated AI models. By leveraging the VoiceDesign model, you can synthesize audio that matches specific emotional tones, genders, or professional requirements, making it an ideal tool for content creation, accessibility features, or personalized assistant interactions. The skill manages voice data locally within structured directories, ensuring high efficiency and data privacy.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal:

clawhub install openclaw/skills/skills/darknoah/qwen-audio

Ensure that you have Python 3.10 or higher installed on your system. Before initializing any audio tasks, navigate to the skill root and verify that all prerequisites listed in ./references/env-check-list.md are satisfied to avoid runtime configuration errors.

Use Cases

Content Creation: Convert written scripts, blog posts, or long-form documents into natural-sounding audio files for podcasting or accessibility.
Custom Voice Branding: Create distinct, branded voice identities for automated customer support or interactive agents.
Meeting Transcription: Utilize the STT capabilities to transcribe audio recordings into clean, formatted text logs for documentation.
Interactive AI: Add a voice interface to your custom OpenClaw agents, allowing them to communicate verbally with users.

Example Prompts

"I want to create a new voice for my assistant. I'm looking for a warm, professional female voice. Can you help me set that up?"
"List all the available voice profiles currently stored in the qwen-audio skill."
"Convert this text document into an audio file using my 'broadcast-pro' voice profile."

Tips & Limitations

Pre-check requirement: Always run the voice list command before attempting a tts generation. If no voices exist, you must create one first; the agent will guide you through this if you ask.
Performance: For optimal results, ensure your reference audio files are clean and free of background noise, as these serve as the foundation for the AI's voice synthesis quality.
Resource Management: Since voices are stored in local directories, monitor your storage if you create a high volume of unique profiles.

Read Full Documentation on GitHub

Metadata

Author@darknoah

Stars3376

Updated2026-03-24

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-darknoah-qwen-audio": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#tts#stt#voice-synthesis#ai-audio

Safety Score: 4/5

Flags: file-read, file-write, code-execution

Related Skills

qwen3-audio

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).

darknoah 3376

free-resource

Search and retrieve royalty-free media from Pixabay (images/videos), Freesound (audio effects), and Jamendo (music/BGM). Use when the user needs to find stock photos, illustrations, vectors, videos, sound effects, or background music, download media, or query media libraries with filters.

darknoah 3376

Rednote Cli

Skill by darknoah

darknoah 3376

redact

Privacy redaction toolkit for images, PDFs, Word documents, and PowerPoint presentations. Use when the user needs to redact, mask, or replace sensitive/private information in files. Triggers: - Redacting or masking sensitive text in images, PDFs, documents, or presentations - Replacing names, phone numbers, IDs, or other PII in files - Processing privacy compliance for documents before sharing - Anonymizing content in visual files Supported formats: png/jpg images, PDF, docx/doc, pptx/ppt

darknoah 3376