screen-narrator
Live narration of your macOS screen activity with Gemini vision + ElevenLabs speech.
Why use this skill?
Transform your macOS screen activity into real-time audio narratives with Gemini Vision and ElevenLabs. Customizable styles for productivity and fun.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/buddyh/narratorWhat This Skill Does
The screen-narrator skill provides dynamic, real-time auditory commentary of your macOS desktop environment. By leveraging Gemini Vision for intelligent scene analysis and ElevenLabs for high-quality text-to-speech synthesis, the skill transforms static screen activity into an engaging audio narrative. Whether you are reviewing work, monitoring logs, or simply exploring creative use cases, the tool offers a variety of narrative styles—ranging from professional sports commentary to humorous noir or ASMR—to suit your preference. The skill is deeply integrated with local filesystem controls, allowing you to manipulate narrative styles, pause streams, and adjust profanity settings in real-time via JSON command files.
Installation
Installation requires access to the source repository and a local Python environment. Navigate to your source directory and follow the standard setup process:
- Ensure you are on a macOS system.
- Navigate to
/Users/buddy/narrator. - Execute the standard venv setup:
python3 -m venv .venvandsource .venv/bin/activate. - Install dependencies:
pip install -r requirements.txt. - Ensure your environment variables
GEMINI_API_KEYandELEVENLABS_API_KEYare properly configured. - For OpenClaw users, install via:
clawhub install openclaw/skills/skills/buddyh/narrator.
Use Cases
This skill is designed for power users and creatives. Common use cases include:
- Accessibility & Monitoring: Get audio alerts or summaries of dashboard changes when your eyes are off the screen.
- Content Creation: Use the 'horror' or 'reality_tv' styles to create entertaining commentary for live-streamed desktop demonstrations or tutorials.
- Productivity & Focus: Use the 'asmr' style to create a unique auditory backdrop while processing tasks.
- Debugging: Monitor system changes with an active, descriptive 'sports' style narrator providing play-by-play updates on UI state transitions.
Example Prompts
- "Start narrating my screen activity in horror style immediately."
- "Switch the narrator to ASMR mode and set the profanity level to low."
- "Pause the screen narration and give me a status update on the current session."
Tips & Limitations
- macOS Exclusive: This skill relies on native macOS screen capture APIs; it will not function on Linux or Windows.
- Performance: Screen recording and model inference can be CPU intensive. Ensure you have sufficient resources during high-resolution recording.
- Control File: Keep the
/tmp/narrator-ctl.jsonfile monitored; if you experience lag in command execution, check for file locks or permissions issues. - Cost: API usage costs apply for both Gemini Vision (image analysis) and ElevenLabs (TTS generation). Monitor your usage patterns to manage expenses.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-buddyh-narrator": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, external-api, code-execution
Related Skills
alexa-cli
Control Amazon Alexa devices and smart home via the `alexacli` CLI. Use when a user asks to speak/announce on Echo devices, control lights/thermostats/locks, send voice commands, or query Alexa.
todoist-cli
Manage Todoist tasks, projects, labels, and sections via the `todoist` CLI. Use when a user asks to add/complete/list tasks, show today's tasks, search tasks, or manage projects.
veo
Generate video using Google Veo (Veo 3.1 / Veo 3.0).