ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified ai models Safety 4/5

alicloud-ai-audio-asr-realtime

Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/cinience/alicloud-ai-audio-asr-realtime
Or

What This Skill Does

The alicloud-ai-audio-asr-realtime skill provides a high-performance, low-latency interface for real-time speech-to-text transcription powered by Alibaba Cloud's Qwen ASR models. Designed specifically for streaming environments, this skill enables developers to integrate live audio capture—such as microphone streams or duplex voice agent inputs—directly into their OpenClaw workflows. It handles the complex orchestration of streaming audio frames, ensuring that partial results are emitted as soon as they are processed, which is critical for creating responsive conversational AI interfaces.

Installation

To integrate this skill into your project, run the following command within your OpenClaw environment: clawhub install openclaw/skills/skills/cinience/alicloud-ai-audio-asr-realtime

Ensure that you have set your DASHSCOPE_API_KEY in your environment variables or via your ~/.alibabacloud/credentials file. Verify your setup by running the validation script: python -m py_compile skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py. If successful, the command will complete silently and create a validation token in the output directory.

Use Cases

  • Real-time Subtitling: Generate instantaneous captions for live video streams or meetings to improve accessibility and information retention.
  • Voice-Agent Duplex Input: Power interactive voice agents that require near-instant transcription to determine intent and trigger downstream agentic actions without the "lag" associated with file-based processing.
  • Interactive Browser/Terminal Clients: Build responsive voice-controlled CLI tools or web interfaces that process audio streams directly from the user's microphone.

Example Prompts

  1. "Initialize a real-time transcription session for a microphone stream using the qwen3-asr-flash-realtime model at 16000Hz sampling rate."
  2. "Start listening to the live audio input stream and output the transcript fragments as they arrive, marking final sentences."
  3. "Prepare a configuration request for the real-time ASR skill, setting the chunk size to 200ms for high-responsiveness in a voice agent context."

Tips & Limitations

  • Audio Format: Always prefer 16kHz mono PCM for the best balance between quality and latency. Using other formats may require unnecessary transcoding on the client side.
  • Chunking: Maintain small chunk sizes (ideally between 100ms and 300ms) to ensure low latency. Larger chunks will result in significant delays in partial result delivery.
  • Scope: This skill is strictly for streaming audio. If you are dealing with static, pre-recorded audio files, use the batch alicloud-ai-audio-asr skill instead to save costs and handle longer durations more efficiently.

Metadata

Author@cinience
Stars3562
Views0
Updated2026-03-29
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-cinience-alicloud-ai-audio-asr-realtime": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#streaming#asr#audio-processing#realtime
Safety Score: 4/5

Flags: network-access, external-api