Official Verified ai models Safety 4/5

zhipu-asr

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio transcription with context prompts, custom hotwords, and multiple audio formats.

Why use this skill?

Convert Chinese audio to text with the Zhipu AI GLM-ASR model. Support for custom hotwords, context-aware transcription, and multi-format audio files.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-asr

Download Source Code (.zip)

What This Skill Does

The zhipu-asr skill is a powerful Automatic Speech Recognition (ASR) tool integrated into the OpenClaw agent ecosystem. Leveraging Zhipu AI's advanced GLM-ASR model, this skill allows users to convert spoken Chinese audio into highly accurate text. It is specifically designed to handle a variety of audio formats, automatically managing format conversion via ffmpeg. Whether you are dealing with short snippets or complex professional dialogues, this tool provides sophisticated features like context prompting and custom hotwords to ensure the transcript captures specialized terminology, names, and industry jargon with high precision.

Installation

To integrate this skill into your environment, use the OpenClaw package manager: clawhub install openclaw/skills/skills/franklu0819-lang/zhipu-asr

Ensure you have set your Zhipu AI API credentials in your shell environment: export ZHIPU_API_KEY="your-key-here"

Use Cases

Meeting Transcription: Automatically transcribe recorded meeting audio, using previous segment summaries as context to maintain flow and terminology consistency.
Technical Documentation: Transcribe voice notes taken during brainstorming sessions involving complex software architecture, using hotwords for specific library or project names.
Medical or Professional Interviews: Utilize domain-specific hotwords to ensure that unique terminology, medical symptoms, or specific company jargon are recognized correctly by the AI model.
Content Creation: Quickly convert spoken scripts, podcast segments, or interview recordings into editable text drafts for further content refinement.

Example Prompts

"Transcribe the file marketing_meeting.wav and use the hotwords 'Campaign, ROI, Quarterly' to ensure marketing terms are captured correctly."
"Process interview_part2.mp3. Use the following context: 'The previous discussion focused on the new Q3 product roadmap and the hiring strategy.'"
"Transcribe technical_demo.wav. Please pay special attention to the following technical terms: 'React, Kubernetes, Microservices, API.'"

Tips & Limitations

Quality Control: While the skill handles various formats (WAV, MP3, OGG, M4A, AAC, FLAC, WMA), WAV at 16000Hz remains the optimal format for high-accuracy recognition.
Constraints: Keep in mind that individual files must not exceed 25 MB or 30 seconds of duration. For longer files, split them into segments and utilize the context parameter to bridge the transcription continuity.
Prompt Engineering: Use the 'hotwords' feature liberally when dealing with non-standard nouns, proper names, or proprietary industry acronyms to drastically improve accuracy on the first pass.

Read Full Documentation on GitHub

Metadata

Author@franklu0819-lang

Stars2387

Updated2026-03-09

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-franklu0819-lang-zhipu-asr": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#asr#speech-to-text#transcription#zhipu#audio

Safety Score: 4/5

Flags: file-read, external-api

Related Skills

zhipu-tts

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.

franklu0819-lang 2387

feishu-file

飞书文件发送技能。支持发送各类文件到飞书聊天，包括文档、图片、压缩包等，自动识别文件类型并处理上传。

franklu0819-lang 2387

clawhub-manager

ClawHub 技能管理工具。封装技能的发布、删除、查询和搜索功能，方便管理 ClawHub 上的技能。

franklu0819-lang 2387

feishu-voice

飞书语音消息发送技能。将文本转换为语音并发送到飞书，支持 TTS 生成、格式转换、时长读取、文件上传和消息发送。

franklu0819-lang 2387

md2pdf

Markdown 转 PDF 技能。将 Markdown 文件转换为精美的 PDF 文档，完美支持中文、代码高亮、自定义样式。

franklu0819-lang 2387