ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Community Verified media Safety 4/5

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/openclaw/skills/openai-whisper
Or

What This Skill Does

The openai-whisper skill provides a powerful and convenient way to perform speech-to-text transcription directly on your local machine, leveraging the renowned Whisper model from OpenAI. Unlike API-based solutions, this skill does not require any API keys, making it a privacy-focused and cost-effective option for converting audio files into text. It supports a variety of audio formats and offers flexibility in transcription tasks, including translation. The Whisper models are automatically downloaded to your system on their first use, typically to ~/.cache/whisper, ensuring a seamless setup experience.

This skill is ideal for situations where you need to process audio recordings without sending sensitive data to external servers. Whether you are transcribing meeting minutes, converting voice notes into text, or even translating spoken language, this local solution ensures your data remains under your control. The skill defaults to the turbo model for a good balance of speed and accuracy, but you can explicitly choose different models, from smaller, faster ones to larger, more accurate versions, to suit your specific needs.

Installation

To install the openai-whisper skill, use the following command in your ClawHub environment:

clawhub install openclaw/openclaw/skills/openai-whisper

This command will download and set up the necessary components for the skill to function locally. Refer to the source repository openclaw/openclaw for more details and potential updates.

Use Cases

  • Meeting Transcription: Transcribe audio recordings of meetings to create searchable text documents, minutes, or summaries. This is invaluable for team collaboration and record-keeping.
  • Voice Note Conversion: Convert your voice memos and personal notes into text for easier editing, sharing, and archival.
  • Accessibility: Make audio content more accessible by providing text transcripts for podcasts, lectures, or interviews.
  • Content Creation: Generate text from spoken word for blog posts, articles, or video captions.
  • Multilingual Transcription & Translation: Transcribe audio in various languages and, if needed, translate it into English directly.

Example Prompts

  • "Transcribe this meeting audio file /home/user/recordings/meeting_20231027.mp3 to text, using the medium model and save it as a .txt file in the current directory."
  • "Translate the audio from /mnt/audio/podcast_episode.m4a into English text and output it as an SRT subtitle file."
  • "Whisper the audio file ~/voice_memos/idea_draft.wav using the small model and save the output to the ~/transcripts folder."

Tips & Limitations

  • Model Management: Models are downloaded automatically. On first run, they are stored in ~/.cache/whisper. You can manage these files manually if needed, but it's generally not required.
  • Performance Tuning: For faster processing on less powerful hardware, use smaller models like tiny or base. For maximum accuracy, especially with noisy audio or complex language, opt for larger models like medium or large. The default turbo model offers a good balance.
  • Audio Quality: The accuracy of the transcription is heavily dependent on the quality of the audio input. Clear audio with minimal background noise will yield the best results.
  • File Formats: Whisper supports a wide range of audio formats, but ensuring your audio is in a common format (like MP3, WAV, M4A) is recommended.
  • Resource Intensive: Running large Whisper models can be CPU and memory intensive. Ensure your system has adequate resources for smooth operation, especially when processing long audio files.
  • No API Key Required: A key advantage is the lack of need for an API key, enhancing privacy and removing external dependencies. However, this also means you are reliant on your local system's processing power.

Metadata

Author@openclaw
Stars289479
Views40
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-openclaw-openai-whisper": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#transcription#whisper#local-ai#audio-processing
Safety Score: 4/5

Flags: file-write, file-read