Official Verified

audio-to-text-and-video-to-text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, WEBM, MOV, AVI, FLAC, and more. Trigger this skill for any request involving: "transcribe", "convert audio to text", "speech to text", "get transcript of", "extract audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user uploads or references a media file and asks what was said, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this skill.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ahqazi-dev/audio-to-text-and-video-to-text

Download Source Code (.zip)

Transcription Skill

Converts audio and video files into clean, readable text using OpenAI's Whisper API and ffmpeg for media handling.

Overview

This skill handles the full pipeline:

Media extraction — use ffmpeg to strip audio from video files and convert to a Whisper-compatible format
Chunking — split large files (>25 MB) into overlapping segments to stay within API limits
Transcription — send each chunk to OpenAI's Whisper API
Assembly — merge chunk transcripts, adjusting timestamps, into a single clean output
Post-processing — optionally clean up with Claude (punctuation, speaker labels, summaries)

Requirements

ffmpeg must be installed (which ffmpeg to verify — it's usually pre-installed in claude.ai's environment)
OpenAI API key stored in the environment as OPENAI_API_KEY — the user must provide this
Python packages: openai, pydub (install via pip if needed)

Quick Start

When a user provides a media file, run the transcription script:

# Install dependencies if missing
pip install openai pydub --break-system-packages -q

# Run transcription
python /home/claude/transcription/scripts/transcribe.py \
  --input "/path/to/media/file" \
  --output "/mnt/user-data/outputs/transcript.txt" \
  --api-key "$OPENAI_API_KEY"

See scripts/transcribe.py for the full implementation.

Supported Formats

Category	Formats
Audio	mp3, wav, m4a, ogg, flac, aac, opus, wma
Video	mp4, mov, avi, mkv, webm, wmv, m4v

ffmpeg handles extraction from any of these.

Options & Flags

Flag	Default	Description
`--model`	`whisper-1`	Whisper model to use (`whisper-1`, `gpt-4o-transcribe`)
`--language`	auto-detect	ISO 639-1 language code (e.g. `en`, `ar`, `fr`)
`--format`	`txt`	Output format: `txt`, `srt`, `vtt`, `json`
`--timestamps`	off	Include timestamps in output
`--chunk-size`	`20`	Max chunk size in MB (must be ≤ 25)
`--prompt`	none	Context hint to improve accuracy (e.g. domain vocab)

Output Formats

txt — plain text, ideal for most uses
srt — SubRip subtitle format (for video players)
vtt — WebVTT format (for web video)
json — full Whisper JSON with segments and timestamps

Step-by-Step Workflow

1. Check for the file

Ask the user to upload the file or provide a local path. Check:

ls /mnt/user-data/uploads/

2. Check ffmpeg and install deps

which ffmpeg && ffmpeg -version 2>&1 | head -1
pip install openai pydub --break-system-packages -q 2>&1 | tail -3

3. Get the API key

If OPENAI_API_KEY is not set in the environment, ask the user:

"Please provide your OpenAI API key — it starts with sk-. You can get one at https://platform.openai.com/api-keys"

4. Run the script

Read Full Documentation on GitHub

Metadata

Author@ahqazi-dev

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ahqazi-dev-audio-to-text-and-video-to-text": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

auto-cleaning-disk

Automatically clean disk space by removing temp files, browser cache, recycle bin/trash, and log files — safely, without deleting any important files. Use this skill whenever the user asks to clean their disk, free up space, remove junk files, clear cache, empty trash, or says things like "disk full", "storage is low", "computer is slow", "clean my disk", "remove junk files", or wants to speed up their system. Works on Windows, Linux, and Mac. Always ask user whether to run automatically or with confirmation.

ahqazi-dev 4473