ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

elevenlabs-stt

使用 ElevenLabs Scribe V2 进行语音转文字。当用户想要语音识别、音频转录、语音转文字,或提到 elevenlabs、scribe 时使用此 skill。

Why use this skill?

Integrate ElevenLabs Scribe V2 into OpenClaw for high-speed speech-to-text, speaker diarization, and accurate multilingual audio transcription.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/hexiaochun/sutui-elevenlabs-stt
Or

What This Skill Does

The elevenlabs-stt skill leverages the powerful ElevenLabs Scribe V2 model to provide high-fidelity speech-to-text transcription services within your AI environment. Designed for speed and accuracy, this tool converts audio files (including mp3, ogg, wav, m4a, and aac) into structured text. Unlike basic transcription tools, Scribe V2 offers advanced capabilities such as automatic language detection, speaker diarization (to distinguish between different speakers in a recording), and audio event tagging (e.g., laughter or applause). This makes it an essential tool for transcribing meetings, interviews, or multimedia content where context and speaker identification are critical.

Installation

You can easily integrate this capability into your OpenClaw agent by running the following command in your terminal: clawhub install openclaw/skills/skills/hexiaochun/sutui-elevenlabs-stt

Use Cases

This skill is highly versatile and fits into various professional and personal workflows:

  • Meeting Transcription: Automatically generate meeting minutes by separating different speakers in long audio recordings.
  • Content Creation: Quickly transcribe podcasts or video interviews to create blog posts or accessibility subtitles.
  • Professional Research: Process qualitative data from research interviews by utilizing the keyterms feature to ensure specialized industry vocabulary is recognized correctly.
  • Multilingual Support: Process content across various languages including English, Mandarin, Japanese, Korean, Spanish, French, and German with high accuracy.

Example Prompts

  1. "Could you transcribe the audio from this meeting recording at https://example.com/meeting.mp3 and make sure to separate the speakers?"
  2. "I have an interview file here: https://example.com/interview.wav. Please use ElevenLabs Scribe to convert it to text and ensure all specialized medical terms are recognized."
  3. "Please perform a speech-to-text conversion on this Japanese audio file: https://example.com/jpn_session.m4a. Detect the language automatically and include tags for non-speech audio events."

Tips & Limitations

  • Precision: While automatic language detection works well, explicitly specifying the language_code often results in higher transcription accuracy, especially for audio with background noise or diverse accents.
  • Cost Efficiency: Using the keyterms parameter improves accuracy for technical jargon but increases the processing cost by 30%. Use it selectively for files where precision in specialized vocabulary is paramount.
  • Diarization: Always enable diarize: true for interviews or multi-party conversations to ensure the output identifies unique speakers.
  • Limits: The keyterms feature supports up to 100 terms, with each term limited to 50 characters. Ensure your list is optimized for the specific context of your audio file.

Metadata

Stars2387
Views0
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-hexiaochun-sutui-elevenlabs-stt": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#speech-to-text#transcription#elevenlabs#scribe
Safety Score: 5/5

Flags: external-api, network-access