augent
The audio & video layer for agents. 22 local MCP tools. No cloud, no API keys.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/augentdevs/augentAugent — Audio & Video Intelligence for AI Agents
Augent is an MCP server that gives your agent 22 tools for audio and video intelligence. Download from 1000+ sites via yt-dlp and aria2c, transcribe in 99 languages via faster-whisper, search by keyword or meaning via sentence-transformers, take notes, identify speakers via pyannote-audio, detect chapters, separate audio via Demucs v4, export clips, extract visual frames, record X/Twitter Spaces (requires user-configured auth token in ~/.augent/auth.json), and generate speech via Kokoro TTS. All processing runs locally. Downloads are saved to ~/Downloads/, notes and clips to ~/Desktop/, transcription memory to ~/.augent/memory/.
Config
{
"mcpServers": {
"augent": {
"command": "augent-mcp"
}
}
}
If augent-mcp is not in PATH, use python3 -m augent.mcp as the command instead.
Install
Install via the ClawHub install button above, or use uv tool install augent for the base package or uv tool install "augent[all]" for all features. FFmpeg is required for audio processing.
Tools
Augent exposes 22 MCP tools:
Core
| Tool | Description |
|---|---|
download_audio | Download audio from video URLs at maximum speed. Supports YouTube, Vimeo, TikTok, Twitter/X, SoundCloud, and 1000+ sites. Uses aria2c multi-connection + concurrent fragments. |
transcribe_audio | Full transcription of any audio file with per-segment timestamps. Returns text, language, duration, and segments. Cached by file hash. |
search_audio | Search audio for keywords. Returns timestamped matches with context snippets. Supports clip export. |
deep_search | Semantic search — find moments by meaning, not just keywords. Uses sentence-transformers embeddings. |
search_memory | Search across ALL stored transcriptions in one query. Keyword or semantic mode. |
take_notes | All-in-one: download audio from URL, transcribe, and save formatted notes. Supports 5 styles: tldr, notes, highlight, eye-candy, quiz. |
clip_export | Export a video clip from any URL for a specific time range. Downloads only the requested segment. |
Analysis
| Tool | Description |
|---|---|
chapters | Auto-detect topic chapters with timestamps using embedding similarity. |
search_proximity | Find where two keywords appear near each other (e.g., "startup" within 30 words of "funding"). |
identify_speakers | Speaker diarization — identify who speaks when. No API keys required. |
separate_audio | Isolate vocals from music/noise using Meta's Demucs v4. Feed clean vocals into transcription. |
batch_search | Search multiple audio files in parallel. Ideal for podcast libraries or interview collections. |
Utilities
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-augentdevs-augent": {
"enabled": true,
"auto_update": true
}
}
}