alicloud-ai-audio-asr-realtime
Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/cinience/alicloud-ai-audio-asr-realtimeWhat This Skill Does
The alicloud-ai-audio-asr-realtime skill provides a high-performance, low-latency interface for real-time speech-to-text transcription powered by Alibaba Cloud's Qwen ASR models. Designed specifically for streaming environments, this skill enables developers to integrate live audio capture—such as microphone streams or duplex voice agent inputs—directly into their OpenClaw workflows. It handles the complex orchestration of streaming audio frames, ensuring that partial results are emitted as soon as they are processed, which is critical for creating responsive conversational AI interfaces.
Installation
To integrate this skill into your project, run the following command within your OpenClaw environment:
clawhub install openclaw/skills/skills/cinience/alicloud-ai-audio-asr-realtime
Ensure that you have set your DASHSCOPE_API_KEY in your environment variables or via your ~/.alibabacloud/credentials file. Verify your setup by running the validation script: python -m py_compile skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py. If successful, the command will complete silently and create a validation token in the output directory.
Use Cases
- Real-time Subtitling: Generate instantaneous captions for live video streams or meetings to improve accessibility and information retention.
- Voice-Agent Duplex Input: Power interactive voice agents that require near-instant transcription to determine intent and trigger downstream agentic actions without the "lag" associated with file-based processing.
- Interactive Browser/Terminal Clients: Build responsive voice-controlled CLI tools or web interfaces that process audio streams directly from the user's microphone.
Example Prompts
- "Initialize a real-time transcription session for a microphone stream using the qwen3-asr-flash-realtime model at 16000Hz sampling rate."
- "Start listening to the live audio input stream and output the transcript fragments as they arrive, marking final sentences."
- "Prepare a configuration request for the real-time ASR skill, setting the chunk size to 200ms for high-responsiveness in a voice agent context."
Tips & Limitations
- Audio Format: Always prefer 16kHz mono PCM for the best balance between quality and latency. Using other formats may require unnecessary transcoding on the client side.
- Chunking: Maintain small chunk sizes (ideally between 100ms and 300ms) to ensure low latency. Larger chunks will result in significant delays in partial result delivery.
- Scope: This skill is strictly for streaming audio. If you are dealing with static, pre-recorded audio files, use the batch
alicloud-ai-audio-asrskill instead to save costs and handle longer durations more efficiently.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-cinience-alicloud-ai-audio-asr-realtime": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api
Related Skills
volcengine-compute-ecs
Manage Volcengine ECS instances and related resources. Use when users need instance inventory, lifecycle operations, troubleshooting, or automation templates for ECS.
alicloud-ai-search-opensearch
Use OpenSearch vector search edition via the Python SDK (ha3engine) to push documents and run HA/SQL searches. Ideal for RAG and vector retrieval pipelines in Claude Code/Codex.
alicloud-storage-oss-ossutil
Alibaba Cloud OSS CLI (ossutil 2.0) skill. Install, configure, and operate OSS from the command line based on the official ossutil overview.
alicloud-platform-openapi-product-api-discovery
Discover and reconcile Alibaba Cloud product catalogs from Ticket System, Support & Service, and BSS OpenAPI; fetch OpenAPI product/version/API metadata; and summarize API coverage to plan new skills. Use when you need a complete product list, product-to-API mapping, or coverage/gap reports for skill generation.
alicloud-ai-image-qwen-image
Generate images with Model Studio DashScope SDK using Qwen Image generation models (qwen-image, qwen-image-plus, qwen-image-max and snapshots). Use when implementing or documenting image.generate requests/responses, mapping prompt/negative_prompt/size/seed/reference_image, or integrating image generation into the video-agent pipeline.