qwen-audio
High-performance audio library with text-to-speech (TTS) and speech-to-text (STT).
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/darknoah/qwen-audioWhat This Skill Does
Qwen-Audio is a powerful, high-performance library integrated into the OpenClaw ecosystem, specifically designed to bridge the gap between text and speech. It provides robust capabilities for both Text-to-Speech (TTS) and Speech-to-Text (STT) processing. At its core, the skill empowers users to create highly customized, reusable voice profiles using sophisticated AI models. By leveraging the VoiceDesign model, you can synthesize audio that matches specific emotional tones, genders, or professional requirements, making it an ideal tool for content creation, accessibility features, or personalized assistant interactions. The skill manages voice data locally within structured directories, ensuring high efficiency and data privacy.
Installation
To integrate this skill into your OpenClaw environment, execute the following command in your terminal:
clawhub install openclaw/skills/skills/darknoah/qwen-audio
Ensure that you have Python 3.10 or higher installed on your system. Before initializing any audio tasks, navigate to the skill root and verify that all prerequisites listed in ./references/env-check-list.md are satisfied to avoid runtime configuration errors.
Use Cases
- Content Creation: Convert written scripts, blog posts, or long-form documents into natural-sounding audio files for podcasting or accessibility.
- Custom Voice Branding: Create distinct, branded voice identities for automated customer support or interactive agents.
- Meeting Transcription: Utilize the STT capabilities to transcribe audio recordings into clean, formatted text logs for documentation.
- Interactive AI: Add a voice interface to your custom OpenClaw agents, allowing them to communicate verbally with users.
Example Prompts
- "I want to create a new voice for my assistant. I'm looking for a warm, professional female voice. Can you help me set that up?"
- "List all the available voice profiles currently stored in the qwen-audio skill."
- "Convert this text document into an audio file using my 'broadcast-pro' voice profile."
Tips & Limitations
- Pre-check requirement: Always run the
voice listcommand before attempting attsgeneration. If no voices exist, you must create one first; the agent will guide you through this if you ask. - Performance: For optimal results, ensure your reference audio files are clean and free of background noise, as these serve as the foundation for the AI's voice synthesis quality.
- Resource Management: Since voices are stored in local directories, monitor your storage if you create a high volume of unique profiles.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-darknoah-qwen-audio": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution
Related Skills
qwen3-audio
High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).
free-resource
Search and retrieve royalty-free media from Pixabay (images/videos), Freesound (audio effects), and Jamendo (music/BGM). Use when the user needs to find stock photos, illustrations, vectors, videos, sound effects, or background music, download media, or query media libraries with filters.
Rednote Cli
Skill by darknoah
redact
Privacy redaction toolkit for images, PDFs, Word documents, and PowerPoint presentations. Use when the user needs to redact, mask, or replace sensitive/private information in files. Triggers: - Redacting or masking sensitive text in images, PDFs, documents, or presentations - Replacing names, phone numbers, IDs, or other PII in files - Processing privacy compliance for documents before sharing - Anonymizing content in visual files Supported formats: png/jpg images, PDF, docx/doc, pptx/ppt