What This Skill Does

The voice-transcribe skill provides a high-quality transcription engine powered by OpenAI’s gpt-4o-mini-transcribe model. Designed specifically for OpenClaw users who frequently handle voice memos or audio files, this tool streamlines the conversion of spoken audio into clean, actionable text. It goes beyond simple transcription by allowing for deep customization via vocabulary hints and automated text replacements, ensuring that specific industry jargon, unique product names, or recurring transcription errors are handled automatically over time.

Installation

To install this skill, ensure you have uv installed as it is a required dependency for the execution environment. Run the following command in your terminal:

clawhub install openclaw/skills/skills/darinkishore/voice-transcribe

Once installed, navigate to the skill directory at /Users/darin/clawd/skills/voice-transcribe/ and create a .env file. Insert your OpenAI API key using the variable OPENAI_API_KEY=sk-.... Proper configuration of this API key is essential for the skill to communicate with OpenAI's transcription endpoints. Once the environment is configured, you can invoke the transcriber directly from your command line using uv run /Users/darin/clawd/skills/voice-transcribe/transcribe <audio-file>.

Use Cases

The primary use case is processing unstructured voice input into structured data for downstream AI analysis. This is particularly effective for users who rely on WhatsApp voice notes or dictation for quick logging. By piping the output of this skill into other terminal utilities (like pbcopy), you can instantly move transcripts into document editors or context-rich AI chat sessions. It is perfect for developers who need to document technical thoughts on the go or managers who need to transcribe meeting snippets while away from their desks.

Example Prompts

"OpenClaw, please transcribe the voice memo stored at /tmp/incoming-memo.ogg and summarize the key action items from the audio."
"Transcribe /Users/darin/downloads/meeting-clip.m4a and save the resulting text into a new note file."
"Can you transcribe the audio file at /tmp/note.mp3 and format the output as a clean bulleted list of tasks?"

Tips & Limitations

To maximize accuracy, maintain the vocab.txt file by adding frequently used, domain-specific terminology that the model might otherwise misidentify. If a specific phrase consistently renders incorrectly, use replacements.txt to map the error to the correct output format. Note that this skill is optimized for English and does not support automatic language detection. Furthermore, it performs caching based on the SHA256 hash of the audio file to save on API costs and execution time. Be aware that this tool performs file-read operations on your local machine and communicates with the OpenAI API via network requests.

voice-transcribe

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)