self-heal-watchdog
Automated self-healing system for OpenClaw gateway with model failover support. Three-layer protection: process watchdog (auto-restart on crash), config guard (backup + rollback), and model health check (auto-switch to fallback model on API failure). Use for: gateway monitoring, automatic recovery, model failover, crash recovery, uptime protection. Triggers: watchdog, self-heal, monitoring, recovery, failover, health check, gateway restart.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bptravel2017/self-heal-watchdogWhat This Skill Does
The self-heal-watchdog is an enterprise-grade automated recovery system designed specifically for the OpenClaw gateway. It provides a robust, three-layer defense mechanism to ensure your AI infrastructure remains operational without manual intervention. By constantly monitoring the health of the system, it bridges the gap between downtime and performance, acting as a background daemon that handles complex error states.
At its core, it manages three critical safety nets: the Process Watchdog, which maintains process availability; the Config Guard, which protects against corruption by enforcing a strict backup-and-rollback policy; and the Model Health Check, which proactively monitors API endpoints and intelligently pivots to fallback models if the primary model becomes unresponsive. By centralizing these recovery workflows, the skill reduces latency in troubleshooting and maximizes uptime for critical agent operations.
Installation
To integrate the watchdog into your system, run the installation command via your terminal:
clawhub install openclaw/skills/skills/bptravel2017/self-heal-watchdog
After installation, execute the setup script to register the watchdog with macOS launchd, which schedules a heartbeat check every 60 seconds:
./scripts/setup.sh
This setup ensures that even after a system reboot, your OpenClaw gateway is prioritized for restoration.
Use Cases
- Production Gateway Stability: Ensure your AI gateway stays online during high-traffic periods.
- Automated Failover Management: Automatically switch to lightweight fallback models when primary LLM providers experience downtime.
- Crash Recovery: Automatically restart hung or crashed gateway processes without human intervention.
- Configuration Resilience: Safeguard gateway settings, ensuring that invalid configuration edits are immediately rolled back to the last known good state.
Example Prompts
- "OpenClaw, run a health check on the gateway and show me if the watchdog has triggered any restarts today."
- "Perform a dry run of the watchdog to see if the current configuration is valid and the model endpoints are responding."
- "Force a model failover immediately to test the backup model configuration."
Tips & Limitations
- Logs: Always check
~/.openclaw/watchdog/watchdog.logif you notice unusual gateway behavior; the logs are automatically rotated to keep the file size manageable. - Localhost Constraints: For security, the watchdog only monitors services on
localhost. Remote monitoring is not supported via these scripts. - Permissions: Ensure your user account has write access to
~/.openclaw/for the Config Guard and status files to function correctly.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bptravel2017-self-heal-watchdog": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution