ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified system Safety 4/5

failover-gateway

Set up an active-passive failover gateway for OpenClaw. Deploy a standby node that auto-promotes when your primary goes down and auto-demotes when it recovers. Includes health monitor script, systemd services, channel splitting strategy, and step-by-step deployment guide. Use when you need high availability, disaster recovery, or redundancy for your OpenClaw instance.

Why use this skill?

Deploy a reliable high availability failover gateway for OpenClaw. Automate standby node promotion and ensure constant uptime for your AI agent.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/ember-claw/failover-gateway-pub
Or

What This Skill Does

The failover-gateway skill provides a robust, active-passive high availability solution for OpenClaw. It addresses the critical need for uptime by managing a standby VPS that monitors your primary instance. If your primary OpenClaw node becomes unreachable, the health monitor triggers an automated promotion sequence, causing the standby node to take over communication responsibilities. This design prevents data loss and service downtime by ensuring that at least one instance of your agent remains operational at all times.

Installation

  1. Provision a secondary, lightweight VPS and install the OpenClaw environment.
  2. Configure Tailscale or a similar VPN to allow encrypted communication between your primary and standby nodes.
  3. Run clawhub install openclaw/skills/skills/ember-claw/failover-gateway-pub on your standby machine.
  4. Initialize your workspace repository using Git to ensure both nodes stay synced.
  5. Modify your standby configuration to only enable specific secondary channels to avoid conflicts.
  6. Deploy the included systemd services to initiate the health monitor, which polls the primary node every 30 seconds.
  7. Test by manually stopping the primary service to observe the standby promotion process.

Use Cases

  • Mission-Critical Operations: Ideal for users managing automated trading or long-running tasks that cannot afford extended downtime.
  • Geographical Redundancy: Deploying nodes in different regions to mitigate localized data center outages.
  • Disaster Recovery: Creating a clean, minimal-resource recovery point that avoids the complexity of load balancing by using a channel-splitting strategy.

Example Prompts

  1. "OpenClaw, verify the current health status of my primary node and report if the failover-gateway is actively monitoring."
  2. "Update my failover-gateway configuration to prioritize Discord notifications as the secondary channel during a primary outage."
  3. "Show me the last timestamp when the standby node successfully polled the primary heartbeat."

Tips & Limitations

  • Channel Splitting: This is the most critical component. By ensuring the primary and standby own different channels, you eliminate split-brain issues without complex database synchronization.
  • Resource Allocation: You can save costs by running a smaller VPS for the standby, as it only needs enough power to handle essential recovery tasks.
  • Limitations: This skill does not synchronize memory state between nodes. If a task is mid-execution during a failover, it may not resume perfectly from the exact second of failure unless your workflow is idempotent.

Metadata

Stars2387
Views1
Updated2026-03-09
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-ember-claw-failover-gateway-pub": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#high-availability#failover#system-admin#disaster-recovery#automation
Safety Score: 4/5

Flags: network-access, file-write, file-read, code-execution