acestep-lyrics-transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.
Why use this skill?
Easily transcribe audio to LRC, SRT, or JSON formats with high precision using OpenAI Whisper or ElevenLabs Scribe via the OpenClaw agent.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/dumoedss/acestep-lyrics-transcriptionWhat This Skill Does
The acestep-lyrics-transcription skill is a specialized OpenClaw tool designed to convert audio files into highly accurate, timestamped lyric files. It leverages industry-leading speech-to-text models—OpenAI Whisper and ElevenLabs Scribe—to capture song lyrics while maintaining perfect synchronization with the audio timeline. The skill supports multiple output formats including LRC (the standard for synchronized lyrics), SRT (subtitle files), and JSON (for granular word-level timing data). By automating the transcription process, it bridges the gap between raw audio files and readable, timed lyrical text, making it an essential utility for audio engineers, content creators, and music enthusiasts.
Installation
To integrate this skill into your environment, use the OpenClaw command-line interface. Run the following command in your terminal:
clawhub install openclaw/skills/skills/dumoedss/acestep-lyrics-transcription
Once installed, ensure you navigate to the skill directory as specified in the configuration steps to perform your initial setup. You must verify that an API key for either OpenAI or ElevenLabs is configured using the included script commands before attempting any transcription tasks.
Use Cases
This skill is ideal for several creative and technical scenarios:
- Karaoke Preparation: Convert music tracks into high-quality .lrc files for karaoke systems or streaming platforms.
- Music Video Creation: Generate .srt files to display synchronized lyrics as subtitles in video editing software.
- Linguistic Analysis: Export transcription data in JSON format to study word-level timing and pacing in vocal performances.
- Archive Digitization: Convert archival music recordings into accessible, searchable text formats.
Example Prompts
- "Please transcribe the audio file located at /music/tracks/song_01.mp3 into an LRC file. Use the OpenAI provider."
- "I need to generate a subtitle file for this interview audio: /home/user/podcast.wav. Can you output it as an SRT file?"
- "Transcribe my latest demo track /audio/demo.mp3 and provide word-level timestamps in JSON format."
Tips & Limitations
- API Costs: Be aware that using paid APIs like OpenAI Whisper will incur small per-minute usage costs on your connected account. Monitor your billing thresholds.
- Audio Quality: Transcription accuracy is highly dependent on the clarity of the audio. Noisy environments or overlapping vocals may lead to inaccuracies.
- Language Support: Ensure you specify the correct language code (e.g., 'en', 'zh', 'ja') during the transcription command to optimize the model's performance for your specific input audio.
- Security: Always use the built-in configuration tools. Never manually edit configuration files with plain-text keys to ensure your credentials remain protected.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-dumoedss-acestep-lyrics-transcription": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, external-api, code-execution
Related Skills
acestep
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.
acestep-songwriting
Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.
acestep-simplemv
Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.