aliyun-speech-transcriber
Transcribe publicly accessible audio or video URLs with Aliyun speech services. Use when the user wants speech-to-text via Aliyun DashScope, needs transcript JSON or extracted plain text, or wants to process a cloud-accessible media URL (including signed Qiniu URLs) into transcription results.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/chenggongdu/aliyun-speech-transcriberWhat This Skill Does
The Aliyun Speech Transcriber skill serves as a robust bridge between your media assets and Aliyun’s advanced AI-powered transcription services. Utilizing the highly performant paraformer-v2 model from DashScope, this skill processes publicly accessible audio or video URLs and converts spoken content into structured JSON data and human-readable plain text. It is designed to handle diverse media formats, making it an ideal solution for developers, content creators, or analysts who need to automate the extraction of transcripts from cloud-based media without manual intervention. By managing the polling lifecycle, API authentication, and data parsing, this skill abstracts away the complexities of the DashScope API, allowing for a seamless integration into your OpenClaw workflow.
Installation
You can integrate this skill into your local environment using the OpenClaw CLI. Run the following command in your terminal:
clawhub install openclaw/skills/skills/chenggongdu/aliyun-speech-transcriber
Ensure that you have your environment variables configured correctly. You must set ASR_DASHSCOPE_API_KEY (or the fallback DASHSCOPE_API_KEY) in your environment. For customized performance, you can also adjust ALIYUN_SPEECH_MODEL, ALIYUN_SPEECH_LANG_HINTS, ALIYUN_SPEECH_POLL_SECONDS, and ALIYUN_SPEECH_TIMEOUT_SECONDS.
Use Cases
- Automated Meeting Minutes: Quickly turn recordings of meetings stored on cloud storage (like Qiniu) into searchable text logs.
- Content Repurposing: Generate transcripts from video lectures or webinars for blog posts or documentation.
- Media Archiving: Batch process long-form audio files to extract text for internal knowledge management systems.
- Multilingual Support: Use language hints to transcribe content effectively across Chinese and English contexts.
Example Prompts
- "Transcribe the audio file located at https://storage.example.com/interview_01.mp3 using the Aliyun speech service."
- "I have two recordings here: https://files.com/part1.wav and https://files.com/part2.wav. Please extract the text from both of these using the aliyun-speech-transcriber."
- "Can you process this URL for me: https://example.com/video-lecture.mp4 and give me the plain text output of the transcript?"
Tips & Limitations
- Accessibility: The URLs provided must be reachable by the Aliyun servers. If your files are private, ensure you generate signed URLs (such as those from Qiniu) before passing them to the skill.
- Efficiency: The skill defaults to a 5-second polling interval; for high-frequency workflows, ensure your timeouts are adjusted to account for the file length to avoid premature failures.
- Best Practices: Always prioritize security by using environment variables for API keys; never hardcode credentials in your scripts or prompts. If a task is very long, allow the timeout to accommodate the processing time.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-chenggongdu-aliyun-speech-transcriber": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api
Related Skills
xiaohongshu-writer-expert
根据任意主题生成小红书爆款文案,自动匹配8种风格模板,输出包含正文、配图描述、封面文案和搜索关键词的完整图文方案
qiniu-upload
Upload local files to Qiniu Cloud and return a publicly accessible URL (or signed private URL). Use when the user wants to upload a local file path to Qiniu, obtain a CDN/public URL, prepare files for downstream cloud processing, or convert local audio/video/documents into externally accessible URLs for other skills such as speech transcription.