deepgram-discord-voice
Voice-channel conversations in Discord using Deepgram streaming STT + low-latency TTS
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/adriel1006/discord-voice-deepgramWhat This Skill Does
The deepgram-discord-voice skill transforms your OpenClaw agent into an interactive voice assistant residing within a Discord voice channel. By leveraging Deepgram's industry-leading streaming STT (Speech-to-Text) and Aura TTS (Text-to-Speech) engines, this plugin provides a near real-time, low-latency conversation loop. It captures voice input directly from the channel, sends it to the agent for processing, and streams the generated response back into the voice call using the optimized Ogg/Opus format. It is designed for seamless integration, allowing you to converse with your AI agent as naturally as you would with another member of your server.
Installation
Installation is straightforward and can be completed via two methods. For automated deployment, use the ClawHub dashboard: navigate to 'Skills/Plugins', search for 'deepgram-discord-voice', and click install. Ensure you have provided your DISCORD_TOKEN and DEEPGRAM_API_KEY in the environment configuration. Alternatively, for manual installation, clone the repository into your project's plugins directory, run npm install to resolve dependencies, and restart your OpenClaw instance. Ensure your Discord bot is granted 'Connect', 'Speak', and 'Use Voice Activity' permissions within your server settings.
Use Cases
This skill is perfect for voice-activated workflows such as brainstorming sessions, virtual stand-ups, or remote collaborative tasks. It is ideal for teams who prefer oral communication over typing, or for users who want a hands-free interaction with their AI agent while engaged in other computer tasks. It is also excellent for accessibility, providing a voice-first interface for users who may struggle with keyboard-based input.
Example Prompts
- "Openclaw, allow Alice to speak to you so she can help with the project report."
- "Openclaw, summarize the last three things discussed in this channel."
- "Openclaw, only me from now on; stop listening to others in the channel."
Tips & Limitations
To minimize latency, always ensure streamingSTT and streamingTTS are set to true in your configuration. The primaryUser setting is a vital security feature; it prevents unauthorized users from hijacking your agent's context. Be mindful that voice commands depend on the wakeWord, which defaults to 'openclaw'. For the best experience, ensure your Discord server's voice region is geographically close to your Deepgram processing region to reduce packet travel time. Note that while this plugin is robust, high network jitter in Discord voice channels can occasionally affect the precision of the transcription stream.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-adriel1006-discord-voice-deepgram": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, external-api, data-collection