Official Verified media Safety 4/5

Speech to Text Transcription

Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.

Why use this skill?

Convert audio and video to text with this OpenClaw skill. Features include speaker diarization, multi-format support, and privacy-focused local Whisper transcription.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ivangdavila/speech-to-text-transcription

Download Source Code (.zip)

What This Skill Does

The Speech to Text Transcription skill is a comprehensive tool designed to transform audio and video content into accurate, readable text. It provides high-performance transcription capabilities for a wide variety of inputs including voice memos, lectures, professional interviews, and meeting recordings. Beyond simple conversion, this skill excels at advanced tasks such as speaker diarization, timestamp generation, and multiple format exports. It acts as an intelligent layer between raw audio files and structured text data, allowing you to easily manage, store, and analyze spoken content directly within your OpenClaw environment.

Installation

To integrate this skill into your workflow, use the standard OpenClaw installation command via your terminal:

clawhub install openclaw/skills/skills/ivangdavila/speech-to-text-transcription

Ensure you have ffmpeg installed on your system as it is a critical dependency for audio processing, splitting, and conversion tasks. If you plan to use cloud-based providers for advanced features like diarization or high-accuracy models, please ensure your API keys for OpenAI, AssemblyAI, or Deepgram are stored in your environment variables.

Use Cases

This skill is perfect for professionals and students who manage significant amounts of spoken data. Use it to transcribe:

Long-form meeting recordings for searchable archives.
Podcasts and interviews for show notes and blog content.
Voice memos for personal productivity and note-taking.
Educational lectures to create study guides and summaries.

Example Prompts

"Transcribe the interview file named 'ceo_interview.mp3' and make sure to identify who is speaking so I can distinguish between the CEO and the interviewer."
"I have a two-hour lecture recording at '/home/user/downloads/physics_101.mp4'. Please process this, generate an SRT file for subtitles, and extract the key action items at the end."
"Transcribe this voice memo from the URL [link] using the local Whisper model to keep it private and offline."

Tips & Limitations

Pre-processing: For best results with noisy audio, ensure the file is cleaned using ffmpeg before transcription.
File Size Management: Do not attempt to process files larger than 25MB or 2 hours in a single step; allow the agent to chunk the file to prevent timeouts.
Provider Selection: Choose your provider wisely. Use local Whisper for private, free transcription. Use AssemblyAI when you need precise speaker labeling. Use OpenAI Whisper API for the highest possible accuracy on complex audio.

Read Full Documentation on GitHub

Metadata

Author@ivangdavila

Stars2102

Updated2026-03-06

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ivangdavila-speech-to-text-transcription": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#audio#whisper#diarization#productivity

Safety Score: 4/5

Flags: file-write, file-read, external-api, code-execution

Related Skills

Animations

Create performant web animations with proper accessibility and timing.

ivangdavila 2190

Arduino

Develop Arduino projects avoiding common wiring, power, and code pitfalls.

ivangdavila 2190

Bulgarian

Write Bulgarian that sounds human. Not formal, not robotic, not AI-generated.

ivangdavila 2190

Arabic

Write Arabic that sounds human. Not formal, not robotic, not AI-generated.

ivangdavila 2190

Assistant

Manage tasks, communications, and scheduling with proactive and organized support.

ivangdavila 2190