ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

deapi-audio

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read aloud', 'voice clone', 'clone voice', 'voice design', 'design voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deapi-video instead.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/aleglowa/deapi-audio
Or

deAPI Audio

Text-to-speech, voice cloning, voice design, and audio transcription via deAPI decentralized GPU network.

Scripts

ScriptUse when...
scripts/text-to-speech.shUser wants to convert text to spoken audio
scripts/voice-clone.shUser wants to clone/replicate a voice from a sample audio file
scripts/voice-design.shUser wants to generate speech with a voice described in natural language
scripts/speech-to-text.shUser wants to transcribe an audio file (AAC, MP3, OGG, WAV, WebM, FLAC, max 10MB)

Your config

! cat ${CLAUDE_SKILL_DIR}/config.json 2>/dev/null || echo "NOT_CONFIGURED"

If the config above is NOT_CONFIGURED, ask the user:

Then write the answer to ${CLAUDE_SKILL_DIR}/config.json as { "api_key": "their_key" }.

Alternatively, the user can set the DEAPI_API_KEY environment variable directly, which takes priority over config.json.

Gotchas

  • For YouTube/video transcription, use the deapi-video skill instead. This skill handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg).
  • Three TTS models: Kokoro (default), Chatterbox, Qwen3. Use --model Chatterbox or --model Qwen3 to switch.
  • Kokoro: Voice ID format is {lang}{gender}_{name}. Language is auto-detected from voice prefix if --lang is omitted.
  • Chatterbox: voice is always default, speed is fixed at 1, supports 22 languages. Text limit 10-2000 chars.
  • Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and generate separately.
  • TTS output format defaults to mp3. WAV files are much larger but lossless.
  • Kokoro: speed range is 0.5-2.0. Values outside this range cause errors.
  • Qwen3 Voice Clone (voice-clone.sh): ref audio must be 5-15 seconds. Too short or too long degrades quality. Formats: MP3, WAV, FLAC, OGG, M4A. URLs are downloaded automatically.
  • Qwen3 Voice Design (voice-design.sh): quality depends on the --instruct description. Encourage specific details: gender, age, accent, speaking style, emotion.
  • Qwen3 models use full language names (English, French, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese.
  • Qwen3 TTS (--model Qwen3): 9 voices available, default Vivian. Chinese language lacks Ryan voice.
  • Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice Design use voice=default.
  • Audio transcription accepts a local file path or URL (--audio). Formats: AAC, MP3, OGG, WAV, WebM, FLAC. Max 10 MB.
  • Result URLs expire in 24 hours. Download promptly.

Quick examples

# Basic TTS
bash scripts/text-to-speech.sh --text "Hello world"

# British voice
bash scripts/text-to-speech.sh --text "Good morning" --voice bf_emma

Metadata

Author@aleglowa
Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-aleglowa-deapi-audio": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.