asr
Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/0xfango/marswave-asrWhat This Skill Does
The ASR (Automatic Speech Recognition) skill for OpenClaw provides a powerful, privacy-focused solution for transcribing audio files directly on your machine. By leveraging the coli CLI and high-performance local models like SenseVoice or Whisper, the skill eliminates the need for expensive cloud APIs or concerns over data privacy. It supports a wide array of languages including Chinese, English, Japanese, Korean, and Cantonese. Beyond simple transcription, the skill features an optional 'polish' mode that utilizes AI to clean up raw transcripts, removing fillers, correcting punctuation, and enhancing overall readability.
Installation
To install this skill, use the following command in your terminal within the OpenClaw environment: clawhub install openclaw/skills/skills/0xfango/marswave-asr. Ensure you have coli installed globally via npm install -g @marswave/coli and have ffmpeg installed on your system path for optimal file compatibility.
Use Cases
This skill is ideal for professionals, students, and content creators who frequently deal with audio files and need quick, accurate text conversions. Use it to transcribe recorded meetings, lecture audio, voice memos, or interviews. It is particularly effective for multilingual environments where SenseVoice can handle diverse language inputs and emotional sentiment analysis. It should not be used for text-to-speech synthesis or complex audio post-production (like podcast editing), as those require specialized skills.
Example Prompts
- "转录这段音频:~/Downloads/meeting_recording.mp3"
- "把这个语音文件转成文字,并帮我润色一下:./audio/interview.wav"
- "我想使用 sensevoice 模型将此录音识别为文本"
Tips & Limitations
The ASR skill runs entirely offline, meaning your audio data never leaves your machine. For the best accuracy, use the SenseVoice model, as it is optimized for multi-language support and emotional recognition. Be aware that the first time you run the command, the system will download the model weights (~60MB), which may take a moment depending on your internet connection. Ensure your audio file paths are absolute or correctly relative to the project directory to avoid 'file not found' errors.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-0xfango-marswave-asr": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, file-write, code-execution
Related Skills
explainer
Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX(视频形式)".
listenhub
Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
listenhub
Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
image-gen
Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".
content-parser
Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL", "extract content", "解析链接", "提取内容".