sag
ElevenLabs text-to-speech with mac-style say UX.
Install via CLI (Recommended)
clawhub install openclaw/openclaw/skills/sagWhat This Skill Does
The sag skill is an advanced text-to-speech (TTS) utility that leverages ElevenLabs for high-quality voice generation, mimicking a macOS-style say command. It allows users to convert text into speech with various customization options, including different voices, models, and even expressive audio tags for nuanced delivery. This skill is ideal for creating natural-sounding voiceovers, enhancing accessibility, or adding dynamic audio elements to AI agent responses.
Installation
To install the sag skill, use the following command:
clawhub install openclaw/openclaw/skills/sag
This will add the sag skill to your OpenClaw environment. You will need to provide your ElevenLabs API key, preferably through the ELEVENLABS_API_KEY environment variable, or alternatively via SAG_API_KEY.
Use Cases
- Dynamic AI Responses: Generate spoken replies for AI agents, making interactions more engaging.
- Content Creation: Quickly create voiceovers for videos, podcasts, or presentations.
- Accessibility: Provide auditory feedback for users who prefer spoken information.
- Prototyping: Test different voice styles and emotional tones for applications.
- Personalized Audio: Create custom audio messages with specific voices and inflections.
Example Prompts
sag "Please read this document aloud in a calm voice."sag speak -v "Crazy Scientist" "Initiate the experiment! [excited] It's time!"sag voices
Tips & Limitations
- API Key: Ensure your
ELEVENLABS_API_KEYorSAG_API_KEYis correctly set. - Voice Customization: Use
sag -v <voice_name_or_id>to select a voice. You can list available voices withsag voices. - Model Selection: Choose from different ElevenLabs models like
eleven_v3(default, expressive),eleven_multilingual_v2(stable), oreleven_flash_v2_5(fast) using the--modelflag. - Pronunciation: For complex words or names, use respelling (e.g., "key-note") or hyphens. The
--normalizeoption (defaultauto) helps with numbers, units, and URLs. Use--langto bias language processing. - Expressive Tags: For
eleven_v3, use tags like[whispers],[shouts],[sings],[laughs],[sighs],[sarcastic],[curious],[excited],[crying],[mischievously]for emotional delivery. Use[pause],[short pause],[long pause]for timing. - SSML Support: Older models (
v2,v2.5) support SSML<break>tags.v3does not directly support SSML<break>but uses the custom pause tags. - Chat Responses: For voice replies in chat, generate an MP3 file using
sag -v <voice> -o <output_path.mp3> "<message>"and then reference it asMEDIA:<output_path.mp3>. - Default Clawd Voice: Use
-v Clawdor the IDlj2rcrvANS3gaWWnczSXfor the default Clawd voice character.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-openclaw-sag": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: external-api, network-access, file-write
Related Skills
apple-notes
Create, view, edit, delete, search, move, or export Apple Notes via the memo CLI on macOS.
sherpa-onnx-tts
Local text-to-speech via sherpa-onnx (offline, no cloud)
goplaces
Query Google Places for text search, place details, resolve, reviews, or scriptable JSON via goplaces.
skill-creator
Create, edit, improve, tidy, review, audit, or restructure AgentSkills and SKILL.md files.
video-frames
Extract frames or short clips from videos using ffmpeg.