ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Why use this skill?

Easily transcribe audio files to text using the OpenAI Whisper API in OpenClaw. Streamline your workflow with accurate speech-to-text processing for meetings and notes.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/steipete/openai-whisper-api
Or

What This Skill Does

The openai-whisper-api skill integrates OpenAI's high-performance Whisper transcription engine directly into your OpenClaw workflow. It provides a robust, command-line-driven interface for converting audio files—such as meeting recordings, voice notes, or interviews—into accurate text transcripts. By utilizing the official OpenAI /v1/audio/transcriptions endpoint, this skill leverages state-of-the-art speech-to-text technology to ensure high word-error-rate performance across various languages and accents. Whether you are automating the documentation of long-form audio or processing quick voice memos, this tool serves as a reliable conduit to cloud-based AI transcription services.

Installation

To integrate this skill into your environment, use the OpenClaw hub CLI. Execute the following command in your terminal:

clawhub install openclaw/skills/skills/steipete/openai-whisper-api

Once installed, ensure your authentication credentials are set. You must provide an OPENAI_API_KEY either as an environment variable or within your ~/.clawdbot/clawdbot.json configuration file. Failure to provide a valid key will result in API authentication errors during execution.

Use Cases

This skill is ideal for professionals and developers who frequently handle audio data. Use it to: 1) Generate meeting minutes by transcribing recorded team calls. 2) Create accessibility transcripts for podcasts or lecture recordings. 3) Batch-process voice notes into searchable text files for knowledge management. 4) Extract structured data from audio prompts to feed into secondary LLM processes, improving the quality of downstream data analysis.

Example Prompts

  1. "Transcribe the file located at /home/user/downloads/interview_01.m4a and save the result as transcript.txt in my documents folder."
  2. "Please transcribe the meeting recording from yesterday. Use the whisper-1 model and ensure the transcription is output in JSON format so I can parse the timestamps."
  3. "Transcribe the voice note /tmp/voice.ogg and provide a prompt context suggesting that the speaker is discussing software architecture to improve accuracy."

Tips & Limitations

For optimal results, ensure your audio files are clear and free of heavy background noise. Whisper supports various file types including m4a, mp3, mp4, mpeg, mpga, wav, and webm. Keep in mind that file size limits are enforced by OpenAI's API; you may need to split very long audio files before processing. Use the --language flag if you have difficulty with multi-lingual audio, as explicit language specification often increases transcription accuracy significantly. Always review transcripts for sensitive data before sharing them, as audio processing is subject to the data retention policies of the API provider.

Metadata

Author@steipete
Stars982
Views1
Updated2026-02-14
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-steipete-openai-whisper-api": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#audio#whisper#speech-to-text#ai
Safety Score: 4/5

Flags: file-read, file-write, external-api