browser-audio-capture
Capture audio from any browser tab — meetings, YouTube, podcasts, courses, webinars — and stream to any AI agent. Zero API keys, works with any framework.
Why use this skill?
Capture audio from any Chrome tab—meetings, YouTube, or courses—and stream it to your AI agent for real-time transcription and notes.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/jarvis563/browser-audio-captureWhat This Skill Does
The browser-audio-capture skill turns your browser into a real-time audio input device for your AI agent. By leveraging Chrome's remote debugging protocol and a lightweight extension, this tool intercepts raw PCM16 audio streams from any active browser tab. Whether you are in a high-stakes business meeting, attending an online lecture, or watching a technical webinar, this skill captures the audio and streams it locally to a configured HTTP endpoint. Because it functions by capturing browser-level audio rather than specific platform APIs, it is universally compatible with platforms like Google Meet, Zoom, Microsoft Teams, and even niche streaming sites. It essentially gives your agent 'ears' inside your browser window, allowing for real-time transcription, summarization, and interactive analysis without relying on complex, per-platform integrations or expensive third-party API keys.
Installation
To begin, ensure you have Python 3.9+ installed along with the aiohttp library. First, run Chrome with the specific remote debugging flags: --remote-debugging-port=9222 --user-data-dir=$HOME/.chrome-debug-profile. Once Chrome is running, install the extension by navigating to chrome://extensions/, enabling Developer Mode, and loading the scripts/extension/ directory. Finally, install the agent skill via the terminal using clawhub install openclaw/skills/skills/jarvis563/browser-audio-capture. Ensure your local audio processing pipeline (Whisper, Riva, or similar) is listening on http://127.0.0.1:8900/audio/browser to receive the incoming base64 encoded chunks.
Use Cases
- Live Meeting Assistant: Automatically generate real-time transcripts and action items for Zoom or Google Meet calls.
- Educational Support: Capture audio from recorded lectures or live webinars to create structured study notes and summaries automatically.
- Competitive Intelligence: Stream audio from earnings call webcasts to your AI agent to extract insights, sentiment, or key financial metrics.
- Customer Call Analysis: Monitor and analyze sales demos or support calls happening in the browser to receive immediate agent feedback or suggested responses.
Example Prompts
- 'Listen to the current Google Meet tab and summarize the key action items in a bulleted list every 5 minutes.'
- 'Start capturing audio from the YouTube lecture in my open tab and create a study guide based on the concepts mentioned.'
- 'Monitor the earnings call in my browser and notify me whenever the speaker mentions revenue projections or growth targets.'
Tips & Limitations
- Debugging: If you cannot find active tabs, always verify that your Chrome browser was launched with the correct
--remote-debugging-portflag. - Persistence: Chrome's Manifest V3 can be aggressive with caching; if the extension behaves unexpectedly, remove and reload the unpacked extension to clear the state.
- Network: This tool streams locally. Ensure your firewall allows communication on port 8900 if your AI processing pipeline is running on a different local service or container.
- Privacy: As this tool captures all audio in a target tab, ensure you comply with local recording and privacy regulations regarding meeting consent.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-jarvis563-browser-audio-capture": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, data-collection
Related Skills
Percept Ambient
Skill by jarvis563
Percept Summarize
Skill by jarvis563
Percept Voice Cmd
Skill by jarvis563
Percept Speaker Id
Skill by jarvis563
stable-browser
Set up reliable browser automation using Chrome DevTools Protocol (CDP) instead of the flaky browser extension relay. Use when browser relay keeps disconnecting, throwing WebSocket 403 errors, or when you need stable headless/headed browser control for web scraping, form filling, social media posting, or any browser automation task. Replaces profile="chrome" with a rock-solid CDP connection.