Asr
Skill by ilyakam
Why use this skill?
Integrate fast, low-cost speech-to-text into your OpenClaw agents. Supports speaker diarization, 100+ languages, and multiple output formats.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/ilyakam/asrWhat This Skill Does
The Asr skill provides high-performance automatic speech-to-text transcription by integrating with the Speech is Cheap (SIC) API. It is engineered for OpenClaw agents to process audio files or remote URLs rapidly and accurately. By handling complex tasks like speaker diarization, word-level timestamping, and multi-language support (100+ languages), this skill transforms raw audio into structured, machine-readable data. Its core advantage lies in its cost-effectiveness and speed, making it an ideal choice for high-volume, automated workflows where transcription throughput is critical.
Installation
To install this skill, run the following command in your terminal: clawhub install openclaw/skills/skills/ilyakam/asr. After installation, configure your credentials by setting the SIC_API_KEY environment variable in your .env configuration file. You can obtain a key at speechischeap.com.
Use Cases
This skill is perfect for developers and power users working on:
- Automated Meeting Notes: Processing recordings from calls to extract key action items.
- Content Localization: Generating SRT or VTT subtitle files for international video content.
- Audio Archiving: Indexing massive libraries of local audio files with searchable transcripts.
- Agent Intelligence: Allowing an AI agent to "listen" to remote media content for sentiment analysis or data extraction.
Example Prompts
- "Transcribe the audio from this URL https://example.com/podcast.mp3 and provide the JSON output including speaker labels."
- "Process the local file ./meeting_recording.wav and save the output as an SRT file for my video project."
- "Transcribe the provided audio file with word-level timestamps and ensure the output language is set to Spanish."
Tips & Limitations
- Efficiency: The skill processes 100 minutes of audio in approximately 1 minute, making it one of the fastest options available.
- Privacy: Use the
--privateflag if you have sensitive data; this ensures that your audio and transcripts are not stored on the service provider's servers. - Cost: Leverage the
CH5coupon code during signup for initial credits. - Limitations: Performance may vary based on audio quality; ensure clear audio to maintain the default 0.5 confidence threshold for accuracy. Always ensure your environment variable is set correctly, or the CLI will return an authentication error.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-ilyakam-asr": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-read, external-api