Official Verified communication Safety 4/5

speech-recognition

通用语音识别 Skill。支持多种音频格式（ogg/mp3/wav/m4a），使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件，或需要转录音频时触发。

Why use this skill?

Easily convert voice messages and audio files to text using OpenClaw's speech-recognition skill, powered by the accurate SenseVoice API.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/demo112/speech-recognition

Download Source Code (.zip)

What This Skill Does

The speech-recognition skill is a powerful audio-to-text bridge for the OpenClaw ecosystem. By integrating with the SiliconFlow SenseVoice API, it provides professional-grade, highly accurate transcription capabilities. Designed specifically for the nuances of Chinese and English speech, this skill acts as an intelligent ear for your agent, allowing it to "hear" and understand voice messages, voice notes, or recorded files directly. It automatically detects the audio content and converts it into structured text that the agent can then analyze, summarize, or act upon.

Installation

To add this capability to your agent, use the OpenClaw command-line interface. Execute the following command in your terminal:

clawhub install openclaw/skills/skills/demo112/speech-recognition

After installation, you must provide your SiliconFlow API credentials. Open your configuration file located at ~/.openclaw/openclaw.json and ensure the providers.siliconflow.apiKey field is populated with your valid secret key. Once the key is saved, the skill is immediately active and ready to process incoming audio files.

Use Cases

Voice Message Processing: Automatically transcribe long voice messages sent via messaging platforms so you can read them at your convenience.
Meeting Transcription: Feed recorded meeting audio snippets into the agent to generate meeting minutes or action items.
Content Creation: Record raw spoken ideas or thoughts and have the agent draft them into formal blog posts or emails.
Accessibility: Ensure that audio-only input from users is fully searchable and accessible within your agent's history and knowledge base.

Example Prompts

"(User uploads a .ogg voice file) Please transcribe this message for me."
"I just sent a voice note, can you summarize the key points mentioned in it?"
"Transcribe this audio file and fix any grammatical errors in the output."

Tips & Limitations

For optimal performance, ensure your audio files are under 10MB in size and shorter than 5 minutes. While the skill supports multiple formats like OGG, MP3, WAV, M4A, and FLAC, it is highly recommended to use MP3 for better compatibility and processing speed. If you encounter errors, use FFmpeg to normalize your files to 16kHz sample rates and mono channels. Remember that this skill processes data via an external API; ensure you are comfortable with your audio data being transmitted to SiliconFlow's servers. Always check your API usage quota to avoid unexpected service interruptions during critical tasks.

Read Full Documentation on GitHub

Metadata

Author@demo112

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-demo112-speech-recognition": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#audio#transcription#voice#speech#siliconflow

Safety Score: 4/5

Flags: network-access, file-read, external-api

Related Skills

self-reflection

三省吾身 - 深度自我审视与进化系统。基于角色理论、系统思维、认知偏差三大框架，通过五阶段闭环（反思→计划→执行→验证→固化）实现真正的自我进化。适用于任何需要深度反思的场景：项目复盘、工作改进、能力提升、习惯养成。

demo112 2387