Official Verified

audio-processing

Audio ingestion, analysis, transformation, and generation (Transcribe, TTS, VAD, Features).

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/iyeque/iyeque-audio-processing

Download Source Code (.zip)

Audio Processing Skill

A comprehensive toolset for audio manipulation and analysis with security validations.

Security

File paths are validated to prevent path traversal attacks
Access to system directories (/etc, /proc, /sys, /root) is blocked
TTS text input is limited to 10,000 characters
All file operations use resolved absolute paths

Tool API

audio_tool

Perform audio operations like transcription, text-to-speech, and feature extraction.

Parameters:
- action (string, required): One of transcribe, tts, extract_features, vad_segments, transform.
- file_path (string, optional): Path to input audio file.
- text (string, optional): Text for TTS (max 10,000 chars).
- output_path (string, optional): Path for output file (default: auto-generated).
- model (string, optional): Whisper model size (tiny, base, small, medium, large). Default: base.
- ops (string, optional): JSON string of operations for transform action.

Usage:

# Transcribe audio file
uv run --with "openai-whisper" --with "pydub" --with "numpy" skills/audio-processing/tool.py transcribe --file_path input.wav

# Transcribe with specific model
uv run --with "openai-whisper" skills/audio-processing/tool.py transcribe --file_path input.wav --model small

# Text-to-speech
uv run --with "gTTS" skills/audio-processing/tool.py tts --text "Hello world" --output_path hello.mp3

# Extract audio features
uv run --with "librosa" --with "numpy" --with "soundfile" skills/audio-processing/tool.py extract_features --file_path input.wav

# Voice activity detection (find speech segments)
uv run --with "pydub" skills/audio-processing/tool.py vad_segments --file_path input.wav

# Transform audio (trim, resample, normalize)
uv run --with "pydub" skills/audio-processing/tool.py transform --file_path input.wav --ops '[{"op": "trim", "start": 10, "end": 30}, {"op": "normalize"}]'

Actions

transcribe

Convert speech to text using OpenAI Whisper.

Returns: { "text": "...", "segments": [...] }
Models: tiny, base, small, medium, large (larger = more accurate, slower)

tts

Generate speech from text using Google TTS.

Returns: { "file_path": "output.mp3", "status": "created" }
Language: English (default)

extract_features

Extract audio features for analysis.

Returns: duration, sample_rate, mfcc_mean, rms_mean
Useful for audio classification, quality analysis

vad_segments

Detect speech segments using silence detection.

Returns: { "segments": [{ "start": 0.5, "end": 3.2 }, ...] }
Uses FFmpeg silencedetect filter
Aggressiveness: 1-3 (default: 2)

transform

Apply transformations to audio files.

Operations: trim, resample, normalize
Returns: { "file_path": "output.wav" }

Requirements

ffmpeg: Required for VAD and transform operations
Python 3.8+: All operations
Disk Space: Whisper models range from 100MB (tiny) to 3GB (large)

Error Handling

Read Full Documentation on GitHub

Metadata

Author@iyeque

Stars2032

Updated2026-03-05

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-iyeque-iyeque-audio-processing": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

unified-web-search

Pick the best source (Tavily, Web Search Plus, Browser, or local files) for a query, run the search, and return ranked results with provenance.

iyeque 2032

local-system-info

Return system metrics (CPU, RAM, disk, processes) using psutil.

iyeque 2032

device-control

Expose safe device actions (volume, brightness, open/close apps) for personal automation.

iyeque 2032