Augent — Audio & Video Intelligence for AI Agents

Augent is an MCP server that gives your agent 22 tools for audio and video intelligence. Download from 1000+ sites via yt-dlp and aria2c, transcribe in 99 languages via faster-whisper, search by keyword or meaning via sentence-transformers, take notes, identify speakers via pyannote-audio, detect chapters, separate audio via Demucs v4, export clips, extract visual frames, record X/Twitter Spaces (requires user-configured auth token in ~/.augent/auth.json), and generate speech via Kokoro TTS. All processing runs locally. Downloads are saved to ~/Downloads/, notes and clips to ~/Desktop/, transcription memory to ~/.augent/memory/.

Config

{
  "mcpServers": {
    "augent": {
      "command": "augent-mcp"
    }
  }
}

If augent-mcp is not in PATH, use python3 -m augent.mcp as the command instead.

Install

Install via the ClawHub install button above, or use uv tool install augent for the base package or uv tool install "augent[all]" for all features. FFmpeg is required for audio processing.

Tools

Augent exposes 22 MCP tools:

Core

Tool	Description
`download_audio`	Download audio from video URLs at maximum speed. Supports YouTube, Vimeo, TikTok, Twitter/X, SoundCloud, and 1000+ sites. Uses aria2c multi-connection + concurrent fragments.
`transcribe_audio`	Full transcription of any audio file with per-segment timestamps. Returns text, language, duration, and segments. Cached by file hash.
`search_audio`	Search audio for keywords. Returns timestamped matches with context snippets. Supports clip export.
`deep_search`	Semantic search — find moments by meaning, not just keywords. Uses sentence-transformers embeddings.
`search_memory`	Search across ALL stored transcriptions in one query. Keyword or semantic mode.
`take_notes`	All-in-one: download audio from URL, transcribe, and save formatted notes. Supports 5 styles: tldr, notes, highlight, eye-candy, quiz.
`clip_export`	Export a video clip from any URL for a specific time range. Downloads only the requested segment.

Analysis

Tool	Description
`chapters`	Auto-detect topic chapters with timestamps using embedding similarity.
`search_proximity`	Find where two keywords appear near each other (e.g., "startup" within 30 words of "funding").
`identify_speakers`	Speaker diarization — identify who speaks when. No API keys required.
`separate_audio`	Isolate vocals from music/noise using Meta's Demucs v4. Feed clean vocals into transcription.
`batch_search`	Search multiple audio files in parallel. Ideal for podcast libraries or interview collections.

augent

Install via CLI (Recommended)