Official Verified media Safety 4/5

Audio

Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.

Why use this skill?

Enhance, convert, and transcribe audio with the OpenClaw Audio skill. Supports FFmpeg, Whisper, and podcast workflows for seamless media editing.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ivangdavila/audio

Download Source Code (.zip)

What This Skill Does

The Audio skill for OpenClaw is a robust, command-line-driven assistant designed to handle end-to-end audio processing tasks. Acting as an interface for powerful tools like FFmpeg, SoX, Whisper, and Demucs, this skill allows users to convert formats, normalize loudness to industry standards, extract audio from video, and perform complex tasks like source separation or speech-to-text transcription. It follows a structured execution pattern that prioritizes source file analysis via ffprobe before applying any transformations, ensuring consistent and high-quality results. Whether you are prepping a file for a podcast, archiving high-fidelity audio, or converting media for web compatibility, this skill automates the heavy lifting.

Installation

To integrate this skill into your OpenClaw environment, execute the following command in your terminal: clawhub install openclaw/skills/skills/ivangdavila/audio

Please ensure that your host system has ffmpeg and ffprobe installed and accessible in your system PATH. For advanced capabilities such as deep noise reduction or stem separation, you should also install sox, whisper (via OpenAI or local pip installation), and demucs respectively.

Use Cases

This skill is built for content creators, developers, and researchers. Common use cases include:

Media Preparation: Converting raw recording formats to distribution-ready MP3 or AAC files.
Podcast Production: Normalizing audio tracks to standard loudness targets (-16 LUFS) to ensure consistent listening experiences.
Content Transcription: Leveraging Whisper to generate accurate SRT or VTT files from recorded meetings or interviews.
Audio Archiving: Converting between lossless formats like WAV and FLAC.
Post-Production: Cleaning up noisy audio, adjusting playback speed, or extracting voice stems from music tracks.

Example Prompts

"I have an interview recording in WAV format, please normalize it for a podcast, convert to MP3 at 192kbps, and generate a transcription file."
"Can you extract the audio from this video file and increase the playback speed by 1.25x?"
"Please remove the background hum from this file and save it as an OGG for WhatsApp sharing."

Tips & Limitations

File Management: The Audio skill does not store files persistently. It is your responsibility to manage and backup your source and processed files.
Precision: Always provide the target platform (e.g., Spotify, Apple Music) so the agent can apply the correct LUFS normalization settings.
Security: This skill requires file-read and file-write access to process your media. Ensure your environment is secure when running automated scripts.
Performance: Complex tasks like Demucs stem separation or local Whisper transcription are CPU/GPU intensive and may take longer depending on file size and hardware specs.

Read Full Documentation on GitHub

Metadata

Author@ivangdavila

Stars2190

Updated2026-03-07

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ivangdavila-audio": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#ffmpeg#transcription#podcast#media

Safety Score: 4/5

Flags: file-read, file-write, code-execution

Related Skills

Animations

Create performant web animations with proper accessibility and timing.

ivangdavila 2190

Arduino

Develop Arduino projects avoiding common wiring, power, and code pitfalls.

ivangdavila 2190

Bulgarian

Write Bulgarian that sounds human. Not formal, not robotic, not AI-generated.

ivangdavila 2190

Arabic

Write Arabic that sounds human. Not formal, not robotic, not AI-generated.

ivangdavila 2190

Assistant

Manage tasks, communications, and scheduling with proactive and organized support.

ivangdavila 2190