Official Verified ai models Safety 4/5

alicloud-ai-audio-asr-realtime

Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/cinience/alicloud-ai-audio-asr-realtime

Download Source Code (.zip)

What This Skill Does

The alicloud-ai-audio-asr-realtime skill provides a high-performance, low-latency interface for real-time speech-to-text transcription powered by Alibaba Cloud's Qwen ASR models. Designed specifically for streaming environments, this skill enables developers to integrate live audio capture—such as microphone streams or duplex voice agent inputs—directly into their OpenClaw workflows. It handles the complex orchestration of streaming audio frames, ensuring that partial results are emitted as soon as they are processed, which is critical for creating responsive conversational AI interfaces.

Installation

To integrate this skill into your project, run the following command within your OpenClaw environment: clawhub install openclaw/skills/skills/cinience/alicloud-ai-audio-asr-realtime

Ensure that you have set your DASHSCOPE_API_KEY in your environment variables or via your ~/.alibabacloud/credentials file. Verify your setup by running the validation script: python -m py_compile skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py. If successful, the command will complete silently and create a validation token in the output directory.

Use Cases

Real-time Subtitling: Generate instantaneous captions for live video streams or meetings to improve accessibility and information retention.
Voice-Agent Duplex Input: Power interactive voice agents that require near-instant transcription to determine intent and trigger downstream agentic actions without the "lag" associated with file-based processing.
Interactive Browser/Terminal Clients: Build responsive voice-controlled CLI tools or web interfaces that process audio streams directly from the user's microphone.

Example Prompts

"Initialize a real-time transcription session for a microphone stream using the qwen3-asr-flash-realtime model at 16000Hz sampling rate."
"Start listening to the live audio input stream and output the transcript fragments as they arrive, marking final sentences."
"Prepare a configuration request for the real-time ASR skill, setting the chunk size to 200ms for high-responsiveness in a voice agent context."

Tips & Limitations

Audio Format: Always prefer 16kHz mono PCM for the best balance between quality and latency. Using other formats may require unnecessary transcoding on the client side.
Chunking: Maintain small chunk sizes (ideally between 100ms and 300ms) to ensure low latency. Larger chunks will result in significant delays in partial result delivery.
Scope: This skill is strictly for streaming audio. If you are dealing with static, pre-recorded audio files, use the batch alicloud-ai-audio-asr skill instead to save costs and handle longer durations more efficiently.

Read Full Documentation on GitHub

Metadata

Author@cinience

Stars3562

Updated2026-03-29

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-cinience-alicloud-ai-audio-asr-realtime": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#speech-to-text#streaming#asr#audio-processing#realtime

Safety Score: 4/5

Flags: network-access, external-api

Related Skills

volcengine-compute-ecs

Manage Volcengine ECS instances and related resources. Use when users need instance inventory, lifecycle operations, troubleshooting, or automation templates for ECS.

cinience 3562

alicloud-ai-search-opensearch

Use OpenSearch vector search edition via the Python SDK (ha3engine) to push documents and run HA/SQL searches. Ideal for RAG and vector retrieval pipelines in Claude Code/Codex.

cinience 3562

alicloud-storage-oss-ossutil

Alibaba Cloud OSS CLI (ossutil 2.0) skill. Install, configure, and operate OSS from the command line based on the official ossutil overview.

cinience 3562

alicloud-platform-openapi-product-api-discovery

Discover and reconcile Alibaba Cloud product catalogs from Ticket System, Support & Service, and BSS OpenAPI; fetch OpenAPI product/version/API metadata; and summarize API coverage to plan new skills. Use when you need a complete product list, product-to-API mapping, or coverage/gap reports for skill generation.

cinience 3562

alicloud-ai-image-qwen-image

Generate images with Model Studio DashScope SDK using Qwen Image generation models (qwen-image, qwen-image-plus, qwen-image-max and snapshots). Use when implementing or documenting image.generate requests/responses, mapping prompt/negative_prompt/size/seed/reference_image, or integrating image generation into the video-agent pipeline.

cinience 3562