zhipu-asr
Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio transcription with context prompts, custom hotwords, and multiple audio formats.
Why use this skill?
Convert Chinese audio to text with the Zhipu AI GLM-ASR model. Support for custom hotwords, context-aware transcription, and multi-format audio files.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-asrWhat This Skill Does
The zhipu-asr skill is a powerful Automatic Speech Recognition (ASR) tool integrated into the OpenClaw agent ecosystem. Leveraging Zhipu AI's advanced GLM-ASR model, this skill allows users to convert spoken Chinese audio into highly accurate text. It is specifically designed to handle a variety of audio formats, automatically managing format conversion via ffmpeg. Whether you are dealing with short snippets or complex professional dialogues, this tool provides sophisticated features like context prompting and custom hotwords to ensure the transcript captures specialized terminology, names, and industry jargon with high precision.
Installation
To integrate this skill into your environment, use the OpenClaw package manager:
clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-asr
Ensure you have set your Zhipu AI API credentials in your shell environment:
export ZHIPU_API_KEY="your-key-here"
Use Cases
- Meeting Transcription: Automatically transcribe recorded meeting audio, using previous segment summaries as context to maintain flow and terminology consistency.
- Technical Documentation: Transcribe voice notes taken during brainstorming sessions involving complex software architecture, using hotwords for specific library or project names.
- Medical or Professional Interviews: Utilize domain-specific hotwords to ensure that unique terminology, medical symptoms, or specific company jargon are recognized correctly by the AI model.
- Content Creation: Quickly convert spoken scripts, podcast segments, or interview recordings into editable text drafts for further content refinement.
Example Prompts
- "Transcribe the file marketing_meeting.wav and use the hotwords 'Campaign, ROI, Quarterly' to ensure marketing terms are captured correctly."
- "Process interview_part2.mp3. Use the following context: 'The previous discussion focused on the new Q3 product roadmap and the hiring strategy.'"
- "Transcribe technical_demo.wav. Please pay special attention to the following technical terms: 'React, Kubernetes, Microservices, API.'"
Tips & Limitations
- Quality Control: While the skill handles various formats (WAV, MP3, OGG, M4A, AAC, FLAC, WMA), WAV at 16000Hz remains the optimal format for high-accuracy recognition.
- Constraints: Keep in mind that individual files must not exceed 25 MB or 30 seconds of duration. For longer files, split them into segments and utilize the context parameter to bridge the transcription continuity.
- Prompt Engineering: Use the 'hotwords' feature liberally when dealing with non-standard nouns, proper names, or proprietary industry acronyms to drastically improve accuracy on the first pass.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-franklu0819-lang-zhipu-asr": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, external-api
Related Skills
zhipu-tts
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.
feishu-file
飞书文件发送技能。支持发送各类文件到飞书聊天,包括文档、图片、压缩包等,自动识别文件类型并处理上传。
clawhub-manager
ClawHub 技能管理工具。封装技能的发布、删除、查询和搜索功能,方便管理 ClawHub 上的技能。
feishu-voice
飞书语音消息发送技能。将文本转换为语音并发送到飞书,支持 TTS 生成、格式转换、时长读取、文件上传和消息发送。
md2pdf
Markdown 转 PDF 技能。将 Markdown 文件转换为精美的 PDF 文档,完美支持中文、代码高亮、自定义样式。