What This Skill Does

The eachlabs-tts skill provides a powerful interface for integrating the ElevenLabs Scribe v1 Speech-to-Text engine into your OpenClaw environment. This tool enables high-accuracy transcription of audio files directly from remote URLs. By leveraging the advanced capabilities of Scribe v1, users can extract not only the spoken text but also perform speaker diarization, which identifies multiple speakers in a recording, and capture word-level timestamps. This makes it an ideal solution for professionals who need to convert interviews, podcasts, or meeting recordings into structured, machine-readable data.

Installation

To integrate this skill into your local environment, execute the following command within your OpenClaw terminal:

clawhub install openclaw/skills/skills/fatih-developer/eachlabs-tts

Once installed, ensure you have a valid API key from your EachLabs/ElevenLabs dashboard. You can define this globally in your clawdbot.json configuration file under the skills entry or export it directly as an environment variable EACHLABS_API_KEY for immediate usage.

Use Cases

This skill is highly versatile and serves several professional workflows. It is perfect for journalists needing fast, accurate transcripts of interviews. It is also invaluable for researchers who need to analyze content within audio files, such as tracking keywords across large datasets. Furthermore, developers can use the diarization feature to build automated meeting minutes or create accessible subtitles for video content by using the word-level timestamp outputs.

Example Prompts

"Transcribe this interview audio at https://example.com/interview.mp3 and include speaker diarization so I know who said what."
"Please process the audio file from https://example.com/podcast.mp3, provide a full JSON output, and ensure the word timestamps are included for synchronization."
"Transcribe the lecture at https://example.com/lecture.mp3, specify the language as English, and tag any background audio events like laughter or applause."

Tips & Limitations

To maximize accuracy, always specify the --lang flag if the language is known, as this helps the underlying model perform better in multi-lingual contexts. Note that this skill currently requires input files to be hosted at a publicly accessible URL; it cannot process local files directly unless they are first uploaded to a reachable hosting service. Always ensure your network has permission to reach the target URL. For complex audio, consider using the --events flag to help categorize non-speech sounds, which can provide better context for your transcription output.

eachlabs-tts

Why use this skill?

Install via CLI (Recommended)

What This Skill Does

Installation

Use Cases

Example Prompts

Tips & Limitations

Metadata

Tags(AI)