Pocket Tts
Skill by sherajdev
Why use this skill?
Generate high-quality voice audio locally using the Pocket TTS skill. Runs offline on CPU, supports voice cloning, and features 8 natural-sounding English voices.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/sherajdev/pocket-ttsWhat This Skill Does
The Pocket TTS skill by sherajdev brings high-quality, real-time text-to-speech capabilities directly to your local machine using Kyutai’s advanced Pocket TTS model. Unlike cloud-based alternatives, this skill operates entirely offline, ensuring maximum privacy and zero latency costs associated with network requests. It is designed to be lightweight, running efficiently on just two CPU cores without requiring a dedicated GPU. The model provides eight high-quality built-in voices and supports advanced voice cloning, allowing users to generate speech that sounds like a specific individual by providing a reference WAV file. Whether you are building an interactive AI agent, generating audio for creative projects, or integrating accessibility features into local applications, this skill offers a robust, developer-friendly Python API and a convenient CLI.
Installation
To get started, first ensure you have accepted the license agreement for the Kyutai Pocket TTS model on Hugging Face. You can install the skill directly via the OpenClaw CLI using the command: clawhub install openclaw/skills/skills/sherajdev/pocket-tts. Alternatively, if you are working within a standard Python environment, use pip install pocket-tts or uvx pocket-tts. The model will automatically download its parameters (~100M) upon the first execution, so ensure you have a stable connection for the initial setup.
Use Cases
This skill is ideal for developers creating local-only AI agents that require voice output. Because it runs on CPU, it is perfect for deploying on edge devices, laptops, or servers without expensive hardware. Use it to provide natural-sounding voice feedback for automation tasks, create automated narration for local media projects, or build personalized AI personas through its unique voice-cloning capabilities.
Example Prompts
- "Speak the following text using the alba voice: 'System status is optimal and all services are running.'"
- "Generate an audio file named briefing.wav using the javert voice with a speed of 1.1x."
- "Clone my voice from recording.wav and use it to say 'Hello, how can I assist you with your tasks today?'"
Tips & Limitations
The model is currently optimized for English language output (v1). While the speed can be adjusted between 0.5x and 2.0x, staying closer to 1.0x usually yields the most natural inflection. Remember that the model requires a valid local WAV file for voice cloning; ensure your input samples are high quality and clear to achieve the best results. Since it runs offline, the performance is strictly limited by your CPU architecture, though it is highly optimized for performance and typically runs at 2-6x real-time speed.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-sherajdev-pocket-tts": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read