Official Verified media Safety 4/5

audio-processing

Audio ingestion, analysis, transformation, and generation (Transcribe, TTS, VAD, Features).

Why use this skill?

Automate audio workflows with OpenClaw. Perform high-accuracy transcription, text-to-speech generation, feature extraction, and audio file transformation easily.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/iyeque/audio-processing

Download Source Code (.zip)

What This Skill Does

The audio-processing skill for OpenClaw is a versatile, high-performance toolkit designed to handle complex audio ingestion, analysis, and generation tasks. By integrating state-of-the-art libraries like OpenAI Whisper, gTTS, and Librosa, this skill allows users to transcribe speech-to-text, generate synthetic speech, extract technical audio features, perform voice activity detection (VAD), and apply signal transformations like resampling or normalization. It is built with a focus on modularity and security, ensuring that all file operations are path-validated and contained within safe execution parameters.

Installation

To integrate this skill into your OpenClaw environment, use the CLI provided by the platform. Ensure your system has FFmpeg installed, as it is a critical dependency for VAD and audio transformation tasks. You can install the skill by running the following command in your terminal: clawhub install openclaw/skills/skills/iyeque/audio-processing. Once installed, verify your environment meets the Python 3.8+ requirement and that you have sufficient disk space to accommodate the OpenAI Whisper models, which vary from 100MB to 3GB.

Use Cases

This skill is ideal for developers and content creators who need to automate audio workflows. Use it to transcribe meeting recordings, generate voiceovers for automated video production, perform bulk audio normalization for podcasts, or analyze audio files to extract metrics like MFCCs or RMS levels for classification tasks. It is particularly effective in building automated pipeline applications where raw audio must be ingested and converted into actionable data.

Example Prompts

"Transcribe the meeting recording at ./recordings/meeting_01.wav using the medium Whisper model and save the output."
"Generate an audio file saying 'Welcome to the dashboard' using text-to-speech and save it as welcome_msg.mp3."
"Take the file './input_clip.wav', trim it to the first 30 seconds, normalize the volume, and save the result as 'processed_clip.wav'."

Tips & Limitations

When using the transcription feature, choose your model wisely: 'tiny' and 'base' are fast but less accurate, whereas 'large' is highly precise but requires significant processing time and RAM. For security, the skill prevents access to sensitive system directories. Always ensure your input file paths are valid. When performing transformations, remember that string-based JSON configurations for operations must be correctly formatted to avoid syntax errors during execution. For large-scale batch processing, monitor your system's disk usage as local caching of model files can grow over time.

Read Full Documentation on GitHub

Metadata

Author@iyeque

Stars2032

Updated2026-03-05

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-iyeque-audio-processing": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#transcription#tts#analysis#signal-processing

Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution

Related Skills

unified-web-search

Pick the best source (Tavily, Web Search Plus, Browser, or local files) for a query, run the search, and return ranked results with provenance.

iyeque 2032

local-system-info

Return system metrics (CPU, RAM, disk, processes) using psutil.

iyeque 2032

device-control

Expose safe device actions (volume, brightness, open/close apps) for personal automation.

iyeque 2032