audio-processing
Audio ingestion, analysis, transformation, and generation (Transcribe, TTS, VAD, Features).
Why use this skill?
Automate audio workflows with OpenClaw. Perform high-accuracy transcription, text-to-speech generation, feature extraction, and audio file transformation easily.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/iyeque/audio-processingWhat This Skill Does
The audio-processing skill for OpenClaw is a versatile, high-performance toolkit designed to handle complex audio ingestion, analysis, and generation tasks. By integrating state-of-the-art libraries like OpenAI Whisper, gTTS, and Librosa, this skill allows users to transcribe speech-to-text, generate synthetic speech, extract technical audio features, perform voice activity detection (VAD), and apply signal transformations like resampling or normalization. It is built with a focus on modularity and security, ensuring that all file operations are path-validated and contained within safe execution parameters.
Installation
To integrate this skill into your OpenClaw environment, use the CLI provided by the platform. Ensure your system has FFmpeg installed, as it is a critical dependency for VAD and audio transformation tasks. You can install the skill by running the following command in your terminal: clawhub install openclaw/skills/skills/iyeque/audio-processing. Once installed, verify your environment meets the Python 3.8+ requirement and that you have sufficient disk space to accommodate the OpenAI Whisper models, which vary from 100MB to 3GB.
Use Cases
This skill is ideal for developers and content creators who need to automate audio workflows. Use it to transcribe meeting recordings, generate voiceovers for automated video production, perform bulk audio normalization for podcasts, or analyze audio files to extract metrics like MFCCs or RMS levels for classification tasks. It is particularly effective in building automated pipeline applications where raw audio must be ingested and converted into actionable data.
Example Prompts
- "Transcribe the meeting recording at ./recordings/meeting_01.wav using the medium Whisper model and save the output."
- "Generate an audio file saying 'Welcome to the dashboard' using text-to-speech and save it as welcome_msg.mp3."
- "Take the file './input_clip.wav', trim it to the first 30 seconds, normalize the volume, and save the result as 'processed_clip.wav'."
Tips & Limitations
When using the transcription feature, choose your model wisely: 'tiny' and 'base' are fast but less accurate, whereas 'large' is highly precise but requires significant processing time and RAM. For security, the skill prevents access to sensitive system directories. Always ensure your input file paths are valid. When performing transformations, remember that string-based JSON configurations for operations must be correctly formatted to avoid syntax errors during execution. For large-scale batch processing, monitor your system's disk usage as local caching of model files can grow over time.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-iyeque-audio-processing": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, external-api, code-execution
Related Skills
unified-web-search
Pick the best source (Tavily, Web Search Plus, Browser, or local files) for a query, run the search, and return ranked results with provenance.
local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
device-control
Expose safe device actions (volume, brightness, open/close apps) for personal automation.
audio-processing
Audio ingestion, analysis, transformation, and generation (Transcribe, TTS, VAD, Features).
unified-web-search
Pick the best source (Tavily, Web Search Plus, Browser, or local files) for a query, run the search, and return ranked results with provenance.