youtube-voice-summarizer
Transform YouTube videos into podcast-style voice summaries using ElevenLabs TTS
Why use this skill?
Convert any YouTube video into professional, podcast-style audio summaries using ElevenLabs, OpenRouter, and Supadata. Enhance your learning workflow.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/franciscoandsam/youtube-voice-summarizer-elevenlabsWhat This Skill Does
The YouTube Voice Summarizer is a powerful productivity tool designed to distill long-form video content into actionable, listener-friendly audio summaries. By integrating Supadata for accurate transcript extraction, OpenRouter for intelligent summarization, and ElevenLabs for industry-leading text-to-speech synthesis, this skill allows users to consume YouTube videos on the go without watching the source media. It specifically transforms hours of video into "podcast-style" snippets that are easy to digest in under 60 seconds, bridging the gap between passive viewing and active learning.
Installation
To deploy this skill, ensure you have a running backend server by following the official repository instructions at https://github.com/Franciscomoney/elevenlabs-moltbot.git. After cloning and installing dependencies, configure your .env file with valid API keys for ElevenLabs (TTS), Supadata (Transcripts), and OpenRouter (Summarization). Once the server is running, the skill can be registered within the OpenClaw environment using the install command: clawhub install openclaw/skills/skills/franciscoandsam/youtube-voice-summarizer-elevenlabs. Ensure your environment supports outbound network requests to these external services.
Use Cases
This skill is perfect for busy professionals, students, or researchers who need to stay informed across multiple domains. Use cases include:
- Daily Briefings: Catching up on multiple long-form tech podcasts or educational lectures during a morning commute.
- Content Curation: Quickly scanning video essays or news reports to determine if they contain relevant data before committing full attention to the content.
- Accessibility: Providing a text-to-audio format for users who prefer auditory learning or are currently away from visual displays.
- Efficient Learning: Generating "key takeaway" audio files that serve as study guides for complex video tutorials.
Example Prompts
- "Summarize this YouTube link for me as a short podcast so I can listen on my commute: https://www.youtube.com/watch?v=example"
- "Give me a detailed 10-minute audio breakdown of this AI lecture using the casual voice option."
- "Can you provide a quick audio summary of the latest news report on this video? Keep it concise."
Tips & Limitations
- Transcription Quality: The skill relies on YouTube subtitles. If a video lacks captions or has poor auto-generated text, the summarization accuracy will drop significantly.
- Processing Time: While the generation is fast, it is a multi-step asynchronous process. Always use the provided status polling mechanism to ensure you are receiving the final audio file.
- Voice Selection: Experiment with different voice styles to match the tone of the video; use 'podcast' for professional content and 'casual' for interviews.
- API Costs: Be aware that each request consumes credits across three different APIs; monitor your usage dashboards to prevent unexpected billing.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-franciscoandsam-youtube-voice-summarizer-elevenlabs": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api