ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified utilities Safety 5/5

asr

Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/0xfango/marswave-asr
Or

What This Skill Does

The ASR (Automatic Speech Recognition) skill for OpenClaw provides a powerful, privacy-focused solution for transcribing audio files directly on your machine. By leveraging the coli CLI and high-performance local models like SenseVoice or Whisper, the skill eliminates the need for expensive cloud APIs or concerns over data privacy. It supports a wide array of languages including Chinese, English, Japanese, Korean, and Cantonese. Beyond simple transcription, the skill features an optional 'polish' mode that utilizes AI to clean up raw transcripts, removing fillers, correcting punctuation, and enhancing overall readability.

Installation

To install this skill, use the following command in your terminal within the OpenClaw environment: clawhub install openclaw/skills/skills/0xfango/marswave-asr. Ensure you have coli installed globally via npm install -g @marswave/coli and have ffmpeg installed on your system path for optimal file compatibility.

Use Cases

This skill is ideal for professionals, students, and content creators who frequently deal with audio files and need quick, accurate text conversions. Use it to transcribe recorded meetings, lecture audio, voice memos, or interviews. It is particularly effective for multilingual environments where SenseVoice can handle diverse language inputs and emotional sentiment analysis. It should not be used for text-to-speech synthesis or complex audio post-production (like podcast editing), as those require specialized skills.

Example Prompts

  1. "转录这段音频:~/Downloads/meeting_recording.mp3"
  2. "把这个语音文件转成文字,并帮我润色一下:./audio/interview.wav"
  3. "我想使用 sensevoice 模型将此录音识别为文本"

Tips & Limitations

The ASR skill runs entirely offline, meaning your audio data never leaves your machine. For the best accuracy, use the SenseVoice model, as it is optimized for multi-language support and emotional recognition. Be aware that the first time you run the command, the system will download the model weights (~60MB), which may take a moment depending on your internet connection. Ensure your audio file paths are absolute or correctly relative to the project directory to avoid 'file not found' errors.

Metadata

Author@0xfango
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-0xfango-marswave-asr": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#asr#transcription#speech-to-text#offline-ai#productivity
Safety Score: 5/5

Flags: file-read, file-write, code-execution