Official Verified media Safety 4/5

aimlapi-voice

Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script with retries and polling.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/aimlapihello/aiml-voice

Download Source Code (.zip)

What This Skill Does

The aimlapi-voice skill serves as a high-performance bridge between OpenClaw and the AIMLAPI speech-to-text infrastructure. It is designed to handle the complexity of asynchronous audio processing by abstracting the tasks of file queuing, MIME-type negotiation, and result polling into a single, reliable Python execution flow. By leveraging industry-standard models like Whisper, this skill ensures that audio input from various sources—ranging from voice memos and interview recordings to ambient noise clips—is converted into accurate, machine-readable text transcripts with minimal configuration overhead.

Installation

To integrate this skill into your local environment, ensure you have the OpenClaw CLI tool installed. Run the following command in your terminal:

clawhub install openclaw/skills/skills/aimlapihello/aiml-voice

After installation, you must configure your authentication credentials. The skill relies on the AIMLAPI_API_KEY environment variable. You can export this directly in your shell profile (e.g., export AIMLAPI_API_KEY="your_key_here") or provide it via the CLI argument --apikey-file when executing the transcription task. Ensure Python 3 is installed and available in your system path.

Use Cases

Transcription Automation: Convert hours of voice memos or meeting recordings into text for indexing and searchability.
Content Creation: Extract raw audio from video projects to generate initial screenplay drafts or accessibility transcripts.
Data Ingestion: Process user-submitted audio files from messaging platforms within an automated pipeline.
Research Support: Automate the transcribing of field interviews, reducing manual effort during qualitative research phases.

Example Prompts

"Transcribe the voice message I just uploaded to the downloads folder using the medium whisper model."
"Please process the audio recording from the board meeting and save the output to meeting_transcript.txt."
"Convert the audio file 'client_interview.ogg' to text and output the result to my logs."

Tips & Limitations

To ensure success, always verify that your audio files are not corrupted and are in a supported format before initiating the task. While the skill defaults to the #g1_whisper-medium model, advanced users may experiment with other available models via the --model flag to balance speed against accuracy. Be aware that the max-wait parameter is critical for long audio files; if your file is particularly long, you may need to increase the --max-wait limit beyond the default 300 seconds to prevent the skill from timing out before the API returns the result. Lastly, always keep your API keys secure and rotate them periodically to maintain account integrity.

Read Full Documentation on GitHub

Metadata

Author@aimlapihello

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-aimlapihello-aiml-voice": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#transcription#audio#speech-to-text#whisper#voice-recognition

Safety Score: 4/5

Flags: file-read, file-write, external-api, code-execution

Related Skills

aimlapi-embeddings

Generate text embeddings via AIMLAPI. Use for semantic search, clustering, or high-dimensional text representations with text-embedding-3-large and other models.

aimlapihello 4473

aimlapi-media-gen

Generate images or videos via AIMLAPI from prompts. Use when Codex needs reliable AI/ML API media generation with retries, explicit User-Agent headers, and async video polling.

aimlapihello 4473

aimlapi-safety

Content moderation and safety checks. Instantly classify text or images as safe or unsafe using AI guardrails.

aimlapihello 4473

aimlapi-llm-reasoning

Run AIMLAPI LLM and reasoning workflows through chat completions with retries, structured outputs, and explicit User-Agent headers. Use when Codex needs scripted prompting/reasoning calls against AIMLAPI models.

aimlapihello 4473

aimlapi-music

Generate high-quality music/songs via AIMLAPI. Supports Suno, Udio, Minimax, and ElevenLabs music models. Use when the user asks for music, songs, or soundtracks with specific lyrics or styles.

aimlapihello 4473