elevenlabs-stt
使用 ElevenLabs Scribe V2 进行语音转文字。当用户想要语音识别、音频转录、语音转文字,或提到 elevenlabs、scribe 时使用此 skill。
Why use this skill?
Integrate ElevenLabs Scribe V2 into OpenClaw for high-speed speech-to-text, speaker diarization, and accurate multilingual audio transcription.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/hexiaochun/sutui-elevenlabs-sttWhat This Skill Does
The elevenlabs-stt skill leverages the powerful ElevenLabs Scribe V2 model to provide high-fidelity speech-to-text transcription services within your AI environment. Designed for speed and accuracy, this tool converts audio files (including mp3, ogg, wav, m4a, and aac) into structured text. Unlike basic transcription tools, Scribe V2 offers advanced capabilities such as automatic language detection, speaker diarization (to distinguish between different speakers in a recording), and audio event tagging (e.g., laughter or applause). This makes it an essential tool for transcribing meetings, interviews, or multimedia content where context and speaker identification are critical.
Installation
You can easily integrate this capability into your OpenClaw agent by running the following command in your terminal:
clawhub install openclaw/skills/skills/hexiaochun/sutui-elevenlabs-stt
Use Cases
This skill is highly versatile and fits into various professional and personal workflows:
- Meeting Transcription: Automatically generate meeting minutes by separating different speakers in long audio recordings.
- Content Creation: Quickly transcribe podcasts or video interviews to create blog posts or accessibility subtitles.
- Professional Research: Process qualitative data from research interviews by utilizing the
keytermsfeature to ensure specialized industry vocabulary is recognized correctly. - Multilingual Support: Process content across various languages including English, Mandarin, Japanese, Korean, Spanish, French, and German with high accuracy.
Example Prompts
- "Could you transcribe the audio from this meeting recording at https://example.com/meeting.mp3 and make sure to separate the speakers?"
- "I have an interview file here: https://example.com/interview.wav. Please use ElevenLabs Scribe to convert it to text and ensure all specialized medical terms are recognized."
- "Please perform a speech-to-text conversion on this Japanese audio file: https://example.com/jpn_session.m4a. Detect the language automatically and include tags for non-speech audio events."
Tips & Limitations
- Precision: While automatic language detection works well, explicitly specifying the
language_codeoften results in higher transcription accuracy, especially for audio with background noise or diverse accents. - Cost Efficiency: Using the
keytermsparameter improves accuracy for technical jargon but increases the processing cost by 30%. Use it selectively for files where precision in specialized vocabulary is paramount. - Diarization: Always enable
diarize: truefor interviews or multi-party conversations to ensure the output identifies unique speakers. - Limits: The
keytermsfeature supports up to 100 terms, with each term limited to 50 characters. Ensure your list is optimized for the specific context of your audio file.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-hexiaochun-sutui-elevenlabs-stt": {
"enabled": true,
"auto_update": true
}
}
}Tags
Flags: external-api, network-access
Related Skills
audio-transcriber
Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration
youtube-summarizer
Automatically fetch YouTube video transcripts, generate structured summaries, and send full transcripts to messaging platforms. Detects YouTube URLs and provides metadata, key insights, and downloadable transcripts.
elevenlabs-twilio-memory-bridge
FastAPI personalization webhook that adds persistent caller memory and dynamic context injection to ElevenLabs Conversational AI agents on Twilio. No audio proxying, file-based persistence, OpenClaw compatible.
voice-note-to-midi
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
spaces-listener
Record, transcribe, and summarize X/Twitter Spaces — live or replays. Auto-downloads audio via yt-dlp, transcribes with Whisper, and generates AI summaries.