agent-health-diagnostics
Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battle-tested across a 6-agent multi-host deployment.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/agenthyjack/agent-health-diagnosticsAgent Health Diagnostics
Scripts available in the Collective Skills repo
Overview
When an OpenClaw agent misbehaves — spamming messages, going dark, burning API credits, or looping on dead channels — this skill provides the diagnostic playbook. Covers the 4 most common failure modes with exact commands to diagnose and fix each one.
Battle-tested across a 6-agent deployment spanning 3 hosts (Windows + Linux + Proxmox).
When to Use This Skill
Use when you observe any of these symptoms:
- Agent sending repeated heartbeat/status messages to Telegram/Discord/etc.
- Agent goes silent despite gateway showing "active"
- Logs show
429 Too many tokensorrate_limiterrors - Channel connection loops:
auto-restart attempt 1/10,2/10, etc. - Memory search errors:
input length exceeds context length - Gateway says "active" but agent doesn't respond to messages
The 4 Failure Modes
1. Heartbeat Spam
Symptom: Agent sends repeated messages every N minutes. Root cause: Heartbeat interval too low (10m = 144 messages/day) + verbose prompt that always generates output instead of HEARTBEAT_OK. Quick fix:
# Check interval
grep -A5 heartbeat ~/.openclaw/openclaw.json
# Fix: set to 30m minimum, simplify prompt to checklist + HEARTBEAT_OK default
# Then restart gateway
openclaw gateway restart
Prevention: Never set heartbeat below 20 minutes. Heartbeat prompts should CHECK things, not CREATE things.
2. API Rate Limit Cascade
Symptom: All models fail, agent goes dark. Root cause: Heartbeat + N crons = (N+1) API calls per interval. Exceeds provider TPM limit → all fallbacks exhausted simultaneously. Quick fix:
# Check for rate limits
journalctl -u <service> --since '1h ago' | grep '429\|rate_limit'
# Count your crons (each burns tokens)
openclaw cron list
# Fix: reduce heartbeat to 30-60m, disable non-essential crons, stagger schedules
Prevention: Calculate token budget before adding crons. Each run ≈ 2K-10K tokens. Route heartbeats to cheap/local models.
3. Channel Death Loop
Symptom: Logs show repeated auto-restart attempt N/10 for IRC/Discord/etc.
Root cause: Target server unreachable → health monitor restarts → fails again → loop. Each restart may trigger model calls, burning API tokens.
Quick fix:
# Check for loops
journalctl -u <service> --since '1h ago' | grep 'auto-restart\|timed out'
# Test connectivity
nc -zv <target-ip> <target-port> -w 5
# Fix: disable the broken channel in openclaw.json
# channels.<name>.enabled = false
openclaw gateway restart
Prevention: Test connectivity BEFORE enabling channels. Disable channels you can't reach.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-agenthyjack-agent-health-diagnostics": {
"enabled": true,
"auto_update": true
}
}
}Tags
Related Skills
calling-agent-squad
Activate a multi-agent team (the Squad) to manage complex projects, business tasks, or development workflows. The squad includes a Manager, Architect, Coder, Reviewer, and Observer. Use when the user wants to "call a squad", "start a project", or "deploy squad" with specialized roles and quality control loops.
harmonia
Check PyTorch, Transformers, and CUDA compatibility. Detect GPU, driver mismatches, and version conflicts in ML environments. Use when the user sets up ML/AI tools, installs torch or transformers, hits dependency errors, or asks about compatible versions.
incident-postmortem-assistant
将事故线索整理成复盘草案,区分根因、诱因、放大器、影响与修复动作。;use for incident, postmortem, sre workflows;do not use for 归责个人, 篡改时间线.
securityvitals
Security vitals checker, also known as ClawVitals. Scans your installation, scores your setup, and shows you exactly what to fix. First scan in seconds.
agent-cost-monitor
Real-time token usage and cost tracking across all your OpenClaw agents — alerts, budgets, and optimization tips