What This Skill Does

The Stt skill, developed by bohnwuks, is a robust speech-to-text integration designed for the OpenClaw AI ecosystem. It leverages OpenAI's Whisper architecture to provide highly accurate transcriptions, specifically optimized for Brazilian Portuguese. The skill acts as a bridge between raw audio data and actionable text, allowing users to ingest various audio formats such as OGG, WAV, MP3, M4A, FLAC, AAC, and OPUS. By integrating directly into the system workflow, it automatically handles file processing via specific tools designed for individual file transcription, continuous directory monitoring, or bulk batch processing. This makes it an essential utility for users dealing with high volumes of voice notes or media content that requires textual analysis.

Installation

To get started, first install the necessary dependencies by running pip install -r requirements.txt within the skill directory. The skill relies on FFmpeg for audio decoding; ensure this is installed by running winget install "Gyan.FFmpeg" on Windows, brew install ffmpeg on macOS, or sudo apt install ffmpeg on Linux. Next, create the designated inbound media directory by executing mkdir -p ../../../media/inbound to ensure the system can properly track incoming files. Finally, add the skill to your agent environment using the command: clawhub install openclaw/skills/skills/bohnwuks/stt.

Use Cases

This skill is perfect for professionals and power users who frequent messaging platforms like WhatsApp or Telegram and need to convert voice memos into text for documentation, task management, or searchable archives. It is also highly effective for journalists or researchers who need to transcribe interviews, meetings, or field recordings into readable, timestamped transcripts without manual typing. By batching files, it serves as an automated transcription assistant for large media libraries.

Example Prompts

"OpenClaw, please transcribe the latest audio file placed in my inbound folder and save the result to my meeting notes."
"Run a batch process on the current inbound directory to convert all pending voice messages to text."
"Transcribe the audio file located at /media/inbound/voice_note_001.ogg and include timestamps in the output."

Tips & Limitations

For optimal results, ensure the audio files are relatively clear of background noise, as Whisper performance is heavily influenced by audio quality. While it excels in Brazilian Portuguese, ensure the environment variables are correctly configured if you need to support multi-language input. Remember that the skill requires local file read access; maintain your inbound folder regularly to prevent disk clutter, as the skill does not automatically purge files after successful processing. Always verify the permissions for the FFmpeg binary to ensure the agent has the necessary execution rights for conversion tasks.

Stt

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)