deAPI Audio

Text-to-speech, voice cloning, voice design, and audio transcription via deAPI decentralized GPU network.

Scripts

Script	Use when...
`scripts/text-to-speech.sh`	User wants to convert text to spoken audio
`scripts/voice-clone.sh`	User wants to clone/replicate a voice from a sample audio file
`scripts/voice-design.sh`	User wants to generate speech with a voice described in natural language
`scripts/speech-to-text.sh`	User wants to transcribe an audio file (AAC, MP3, OGG, WAV, WebM, FLAC, max 10MB)

Your config

! cat ${CLAUDE_SKILL_DIR}/config.json 2>/dev/null || echo "NOT_CONFIGURED"

If the config above is NOT_CONFIGURED, ask the user:

What is your deAPI API key? (get one at https://deapi.ai, free $5 credit)

Then write the answer to ${CLAUDE_SKILL_DIR}/config.json as { "api_key": "their_key" }.

Alternatively, the user can set the DEAPI_API_KEY environment variable directly, which takes priority over config.json.

Gotchas

For YouTube/video transcription, use the deapi-video skill instead. This skill handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg).
Three TTS models: Kokoro (default), Chatterbox, Qwen3. Use --model Chatterbox or --model Qwen3 to switch.
Kokoro: Voice ID format is {lang}{gender}_{name}. Language is auto-detected from voice prefix if --lang is omitted.
Chatterbox: voice is always default, speed is fixed at 1, supports 22 languages. Text limit 10-2000 chars.
Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and generate separately.
TTS output format defaults to mp3. WAV files are much larger but lossless.
Kokoro: speed range is 0.5-2.0. Values outside this range cause errors.
Qwen3 Voice Clone (voice-clone.sh): ref audio must be 5-15 seconds. Too short or too long degrades quality. Formats: MP3, WAV, FLAC, OGG, M4A. URLs are downloaded automatically.
Qwen3 Voice Design (voice-design.sh): quality depends on the --instruct description. Encourage specific details: gender, age, accent, speaking style, emotion.
Qwen3 models use full language names (English, French, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese.
Qwen3 TTS (--model Qwen3): 9 voices available, default Vivian. Chinese language lacks Ryan voice.
Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice Design use voice=default.
Audio transcription accepts a local file path or URL (--audio). Formats: AAC, MP3, OGG, WAV, WebM, FLAC. Max 10 MB.
Result URLs expire in 24 hours. Download promptly.

Quick examples

# Basic TTS
bash scripts/text-to-speech.sh --text "Hello world"

# British voice
bash scripts/text-to-speech.sh --text "Good morning" --voice bf_emma

deapi-audio

Install via CLI (Recommended)

deAPI Audio

Scripts

Your config

Gotchas

Quick examples

Metadata

Related Skills

deapi