ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified system Safety 4/5

self-heal-watchdog

Automated self-healing system for OpenClaw gateway with model failover support. Three-layer protection: process watchdog (auto-restart on crash), config guard (backup + rollback), and model health check (auto-switch to fallback model on API failure). Use for: gateway monitoring, automatic recovery, model failover, crash recovery, uptime protection. Triggers: watchdog, self-heal, monitoring, recovery, failover, health check, gateway restart.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bptravel2017/self-heal-watchdog
Or

What This Skill Does

The self-heal-watchdog is an enterprise-grade automated recovery system designed specifically for the OpenClaw gateway. It provides a robust, three-layer defense mechanism to ensure your AI infrastructure remains operational without manual intervention. By constantly monitoring the health of the system, it bridges the gap between downtime and performance, acting as a background daemon that handles complex error states.

At its core, it manages three critical safety nets: the Process Watchdog, which maintains process availability; the Config Guard, which protects against corruption by enforcing a strict backup-and-rollback policy; and the Model Health Check, which proactively monitors API endpoints and intelligently pivots to fallback models if the primary model becomes unresponsive. By centralizing these recovery workflows, the skill reduces latency in troubleshooting and maximizes uptime for critical agent operations.

Installation

To integrate the watchdog into your system, run the installation command via your terminal:

clawhub install openclaw/skills/skills/bptravel2017/self-heal-watchdog

After installation, execute the setup script to register the watchdog with macOS launchd, which schedules a heartbeat check every 60 seconds:

./scripts/setup.sh

This setup ensures that even after a system reboot, your OpenClaw gateway is prioritized for restoration.

Use Cases

  • Production Gateway Stability: Ensure your AI gateway stays online during high-traffic periods.
  • Automated Failover Management: Automatically switch to lightweight fallback models when primary LLM providers experience downtime.
  • Crash Recovery: Automatically restart hung or crashed gateway processes without human intervention.
  • Configuration Resilience: Safeguard gateway settings, ensuring that invalid configuration edits are immediately rolled back to the last known good state.

Example Prompts

  • "OpenClaw, run a health check on the gateway and show me if the watchdog has triggered any restarts today."
  • "Perform a dry run of the watchdog to see if the current configuration is valid and the model endpoints are responding."
  • "Force a model failover immediately to test the backup model configuration."

Tips & Limitations

  • Logs: Always check ~/.openclaw/watchdog/watchdog.log if you notice unusual gateway behavior; the logs are automatically rotated to keep the file size manageable.
  • Localhost Constraints: For security, the watchdog only monitors services on localhost. Remote monitoring is not supported via these scripts.
  • Permissions: Ensure your user account has write access to ~/.openclaw/ for the Config Guard and status files to function correctly.

Metadata

Stars4190
Views1
Updated2026-04-18
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bptravel2017-self-heal-watchdog": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#monitoring#recovery#failover#uptime#gateway
Safety Score: 4/5

Flags: file-write, file-read, code-execution