ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified media Safety 5/5

Pocket Tts

Skill by sherajdev

Why use this skill?

Generate high-quality voice audio locally using the Pocket TTS skill. Runs offline on CPU, supports voice cloning, and features 8 natural-sounding English voices.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/sherajdev/pocket-tts
Or

What This Skill Does

The Pocket TTS skill by sherajdev brings high-quality, real-time text-to-speech capabilities directly to your local machine using Kyutai’s advanced Pocket TTS model. Unlike cloud-based alternatives, this skill operates entirely offline, ensuring maximum privacy and zero latency costs associated with network requests. It is designed to be lightweight, running efficiently on just two CPU cores without requiring a dedicated GPU. The model provides eight high-quality built-in voices and supports advanced voice cloning, allowing users to generate speech that sounds like a specific individual by providing a reference WAV file. Whether you are building an interactive AI agent, generating audio for creative projects, or integrating accessibility features into local applications, this skill offers a robust, developer-friendly Python API and a convenient CLI.

Installation

To get started, first ensure you have accepted the license agreement for the Kyutai Pocket TTS model on Hugging Face. You can install the skill directly via the OpenClaw CLI using the command: clawhub install openclaw/skills/skills/sherajdev/pocket-tts. Alternatively, if you are working within a standard Python environment, use pip install pocket-tts or uvx pocket-tts. The model will automatically download its parameters (~100M) upon the first execution, so ensure you have a stable connection for the initial setup.

Use Cases

This skill is ideal for developers creating local-only AI agents that require voice output. Because it runs on CPU, it is perfect for deploying on edge devices, laptops, or servers without expensive hardware. Use it to provide natural-sounding voice feedback for automation tasks, create automated narration for local media projects, or build personalized AI personas through its unique voice-cloning capabilities.

Example Prompts

  1. "Speak the following text using the alba voice: 'System status is optimal and all services are running.'"
  2. "Generate an audio file named briefing.wav using the javert voice with a speed of 1.1x."
  3. "Clone my voice from recording.wav and use it to say 'Hello, how can I assist you with your tasks today?'"

Tips & Limitations

The model is currently optimized for English language output (v1). While the speed can be adjusted between 0.5x and 2.0x, staying closer to 1.0x usually yields the most natural inflection. Remember that the model requires a valid local WAV file for voice cloning; ensure your input samples are high quality and clear to achieve the best results. Since it runs offline, the performance is strictly limited by your CPU architecture, though it is highly optimized for performance and typically runs at 2-6x real-time speed.

Metadata

Author@sherajdev
Stars1015
Views1
Updated2026-02-15
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-sherajdev-pocket-tts": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#tts#offline#voice-cloning#audio#local-ai
Safety Score: 5/5

Flags: file-write, file-read