ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

audio-processing

Audio ingestion, analysis, transformation, and generation (Transcribe, TTS, VAD, Features).

Why use this skill?

Automate audio workflows with OpenClaw. Perform high-accuracy transcription, text-to-speech generation, feature extraction, and audio file transformation easily.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/iyeque/audio-processing
Or

What This Skill Does

The audio-processing skill for OpenClaw is a versatile, high-performance toolkit designed to handle complex audio ingestion, analysis, and generation tasks. By integrating state-of-the-art libraries like OpenAI Whisper, gTTS, and Librosa, this skill allows users to transcribe speech-to-text, generate synthetic speech, extract technical audio features, perform voice activity detection (VAD), and apply signal transformations like resampling or normalization. It is built with a focus on modularity and security, ensuring that all file operations are path-validated and contained within safe execution parameters.

Installation

To integrate this skill into your OpenClaw environment, use the CLI provided by the platform. Ensure your system has FFmpeg installed, as it is a critical dependency for VAD and audio transformation tasks. You can install the skill by running the following command in your terminal: clawhub install openclaw/skills/skills/iyeque/audio-processing. Once installed, verify your environment meets the Python 3.8+ requirement and that you have sufficient disk space to accommodate the OpenAI Whisper models, which vary from 100MB to 3GB.

Use Cases

This skill is ideal for developers and content creators who need to automate audio workflows. Use it to transcribe meeting recordings, generate voiceovers for automated video production, perform bulk audio normalization for podcasts, or analyze audio files to extract metrics like MFCCs or RMS levels for classification tasks. It is particularly effective in building automated pipeline applications where raw audio must be ingested and converted into actionable data.

Example Prompts

  1. "Transcribe the meeting recording at ./recordings/meeting_01.wav using the medium Whisper model and save the output."
  2. "Generate an audio file saying 'Welcome to the dashboard' using text-to-speech and save it as welcome_msg.mp3."
  3. "Take the file './input_clip.wav', trim it to the first 30 seconds, normalize the volume, and save the result as 'processed_clip.wav'."

Tips & Limitations

When using the transcription feature, choose your model wisely: 'tiny' and 'base' are fast but less accurate, whereas 'large' is highly precise but requires significant processing time and RAM. For security, the skill prevents access to sensitive system directories. Always ensure your input file paths are valid. When performing transformations, remember that string-based JSON configurations for operations must be correctly formatted to avoid syntax errors during execution. For large-scale batch processing, monitor your system's disk usage as local caching of model files can grow over time.

Metadata

Author@iyeque
Stars2032
Views1
Updated2026-03-05
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-iyeque-audio-processing": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#transcription#tts#analysis#signal-processing
Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution