Community Verified media Safety 4/5

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/openclaw/skills/openai-whisper

Download Source Code (.zip)

What This Skill Does

The openai-whisper skill provides a powerful and convenient way to perform speech-to-text transcription directly on your local machine, leveraging the renowned Whisper model from OpenAI. Unlike API-based solutions, this skill does not require any API keys, making it a privacy-focused and cost-effective option for converting audio files into text. It supports a variety of audio formats and offers flexibility in transcription tasks, including translation. The Whisper models are automatically downloaded to your system on their first use, typically to ~/.cache/whisper, ensuring a seamless setup experience.

This skill is ideal for situations where you need to process audio recordings without sending sensitive data to external servers. Whether you are transcribing meeting minutes, converting voice notes into text, or even translating spoken language, this local solution ensures your data remains under your control. The skill defaults to the turbo model for a good balance of speed and accuracy, but you can explicitly choose different models, from smaller, faster ones to larger, more accurate versions, to suit your specific needs.

Installation

To install the openai-whisper skill, use the following command in your ClawHub environment:

clawhub install openclaw/openclaw/skills/openai-whisper

This command will download and set up the necessary components for the skill to function locally. Refer to the source repository openclaw/openclaw for more details and potential updates.

Use Cases

Meeting Transcription: Transcribe audio recordings of meetings to create searchable text documents, minutes, or summaries. This is invaluable for team collaboration and record-keeping.
Voice Note Conversion: Convert your voice memos and personal notes into text for easier editing, sharing, and archival.
Accessibility: Make audio content more accessible by providing text transcripts for podcasts, lectures, or interviews.
Content Creation: Generate text from spoken word for blog posts, articles, or video captions.
Multilingual Transcription & Translation: Transcribe audio in various languages and, if needed, translate it into English directly.

Example Prompts

"Transcribe this meeting audio file /home/user/recordings/meeting_20231027.mp3 to text, using the medium model and save it as a .txt file in the current directory."
"Translate the audio from /mnt/audio/podcast_episode.m4a into English text and output it as an SRT subtitle file."
"Whisper the audio file ~/voice_memos/idea_draft.wav using the small model and save the output to the ~/transcripts folder."

Tips & Limitations

Model Management: Models are downloaded automatically. On first run, they are stored in ~/.cache/whisper. You can manage these files manually if needed, but it's generally not required.
Performance Tuning: For faster processing on less powerful hardware, use smaller models like tiny or base. For maximum accuracy, especially with noisy audio or complex language, opt for larger models like medium or large. The default turbo model offers a good balance.
Audio Quality: The accuracy of the transcription is heavily dependent on the quality of the audio input. Clear audio with minimal background noise will yield the best results.
File Formats: Whisper supports a wide range of audio formats, but ensuring your audio is in a common format (like MP3, WAV, M4A) is recommended.
Resource Intensive: Running large Whisper models can be CPU and memory intensive. Ensure your system has adequate resources for smooth operation, especially when processing long audio files.
No API Key Required: A key advantage is the lack of need for an API key, enhancing privacy and removing external dependencies. However, this also means you are reliant on your local system's processing power.

Read Full Documentation on GitHub

Metadata

Author@openclaw

Stars369848

Updated2026-05-08

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-openclaw-openai-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#transcription#whisper#local-ai#audio-processing

Safety Score: 4/5

Flags: file-write, file-read

Related Skills

apple-notes

Create, view, edit, delete, search, move, or export Apple Notes via the memo CLI on macOS.

openclaw 370199

sherpa-onnx-tts

Local text-to-speech via sherpa-onnx (offline, no cloud)

openclaw 370199

goplaces

Query Google Places for text search, place details, resolve, reviews, or scriptable JSON via goplaces.

openclaw 370199

skill-creator

Create, edit, improve, tidy, review, audit, or restructure AgentSkills and SKILL.md files.

openclaw 370199

video-frames

Extract frames or short clips from videos using ffmpeg.

openclaw 370199