ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 4/5

aliyun-speech-transcriber

Transcribe publicly accessible audio or video URLs with Aliyun speech services. Use when the user wants speech-to-text via Aliyun DashScope, needs transcript JSON or extracted plain text, or wants to process a cloud-accessible media URL (including signed Qiniu URLs) into transcription results.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/chenggongdu/aliyun-speech-transcriber
Or

What This Skill Does

The Aliyun Speech Transcriber skill serves as a robust bridge between your media assets and Aliyun’s advanced AI-powered transcription services. Utilizing the highly performant paraformer-v2 model from DashScope, this skill processes publicly accessible audio or video URLs and converts spoken content into structured JSON data and human-readable plain text. It is designed to handle diverse media formats, making it an ideal solution for developers, content creators, or analysts who need to automate the extraction of transcripts from cloud-based media without manual intervention. By managing the polling lifecycle, API authentication, and data parsing, this skill abstracts away the complexities of the DashScope API, allowing for a seamless integration into your OpenClaw workflow.

Installation

You can integrate this skill into your local environment using the OpenClaw CLI. Run the following command in your terminal: clawhub install openclaw/skills/skills/chenggongdu/aliyun-speech-transcriber

Ensure that you have your environment variables configured correctly. You must set ASR_DASHSCOPE_API_KEY (or the fallback DASHSCOPE_API_KEY) in your environment. For customized performance, you can also adjust ALIYUN_SPEECH_MODEL, ALIYUN_SPEECH_LANG_HINTS, ALIYUN_SPEECH_POLL_SECONDS, and ALIYUN_SPEECH_TIMEOUT_SECONDS.

Use Cases

  • Automated Meeting Minutes: Quickly turn recordings of meetings stored on cloud storage (like Qiniu) into searchable text logs.
  • Content Repurposing: Generate transcripts from video lectures or webinars for blog posts or documentation.
  • Media Archiving: Batch process long-form audio files to extract text for internal knowledge management systems.
  • Multilingual Support: Use language hints to transcribe content effectively across Chinese and English contexts.

Example Prompts

  1. "Transcribe the audio file located at https://storage.example.com/interview_01.mp3 using the Aliyun speech service."
  2. "I have two recordings here: https://files.com/part1.wav and https://files.com/part2.wav. Please extract the text from both of these using the aliyun-speech-transcriber."
  3. "Can you process this URL for me: https://example.com/video-lecture.mp4 and give me the plain text output of the transcript?"

Tips & Limitations

  • Accessibility: The URLs provided must be reachable by the Aliyun servers. If your files are private, ensure you generate signed URLs (such as those from Qiniu) before passing them to the skill.
  • Efficiency: The skill defaults to a 5-second polling interval; for high-frequency workflows, ensure your timeouts are adjusted to account for the file length to avoid premature failures.
  • Best Practices: Always prioritize security by using environment variables for API keys; never hardcode credentials in your scripts or prompts. If a task is very long, allow the timeout to accommodate the processing time.

Metadata

Stars3840
Views1
Updated2026-04-06
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-chenggongdu-aliyun-speech-transcriber": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#aliyun#speech-to-text#media-processing#automation
Safety Score: 4/5

Flags: network-access, external-api