ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Community Verified media Safety 4/5

sag

ElevenLabs text-to-speech with mac-style say UX.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/openclaw/skills/sag
Or

What This Skill Does

The sag skill is an advanced text-to-speech (TTS) utility that leverages ElevenLabs for high-quality voice generation, mimicking a macOS-style say command. It allows users to convert text into speech with various customization options, including different voices, models, and even expressive audio tags for nuanced delivery. This skill is ideal for creating natural-sounding voiceovers, enhancing accessibility, or adding dynamic audio elements to AI agent responses.

Installation

To install the sag skill, use the following command:

clawhub install openclaw/openclaw/skills/sag

This will add the sag skill to your OpenClaw environment. You will need to provide your ElevenLabs API key, preferably through the ELEVENLABS_API_KEY environment variable, or alternatively via SAG_API_KEY.

Use Cases

  • Dynamic AI Responses: Generate spoken replies for AI agents, making interactions more engaging.
  • Content Creation: Quickly create voiceovers for videos, podcasts, or presentations.
  • Accessibility: Provide auditory feedback for users who prefer spoken information.
  • Prototyping: Test different voice styles and emotional tones for applications.
  • Personalized Audio: Create custom audio messages with specific voices and inflections.

Example Prompts

  1. sag "Please read this document aloud in a calm voice."
  2. sag speak -v "Crazy Scientist" "Initiate the experiment! [excited] It's time!"
  3. sag voices

Tips & Limitations

  • API Key: Ensure your ELEVENLABS_API_KEY or SAG_API_KEY is correctly set.
  • Voice Customization: Use sag -v <voice_name_or_id> to select a voice. You can list available voices with sag voices.
  • Model Selection: Choose from different ElevenLabs models like eleven_v3 (default, expressive), eleven_multilingual_v2 (stable), or eleven_flash_v2_5 (fast) using the --model flag.
  • Pronunciation: For complex words or names, use respelling (e.g., "key-note") or hyphens. The --normalize option (default auto) helps with numbers, units, and URLs. Use --lang to bias language processing.
  • Expressive Tags: For eleven_v3, use tags like [whispers], [shouts], [sings], [laughs], [sighs], [sarcastic], [curious], [excited], [crying], [mischievously] for emotional delivery. Use [pause], [short pause], [long pause] for timing.
  • SSML Support: Older models (v2, v2.5) support SSML <break> tags. v3 does not directly support SSML <break> but uses the custom pause tags.
  • Chat Responses: For voice replies in chat, generate an MP3 file using sag -v <voice> -o <output_path.mp3> "<message>" and then reference it as MEDIA:<output_path.mp3>.
  • Default Clawd Voice: Use -v Clawd or the ID lj2rcrvANS3gaWWnczSX for the default Clawd voice character.

Metadata

Author@openclaw
Stars289479
Views40
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-openclaw-sag": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#elevenlabs#voice#audio
Safety Score: 4/5

Flags: external-api, network-access, file-write