failover-gateway
Set up an active-passive failover gateway for OpenClaw. Deploy a standby node that auto-promotes when your primary goes down and auto-demotes when it recovers. Includes health monitor script, systemd services, channel splitting strategy, and step-by-step deployment guide. Use when you need high availability, disaster recovery, or redundancy for your OpenClaw instance.
Why use this skill?
Deploy a reliable high availability failover gateway for OpenClaw. Automate standby node promotion and ensure constant uptime for your AI agent.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/ember-claw/failover-gateway-pubWhat This Skill Does
The failover-gateway skill provides a robust, active-passive high availability solution for OpenClaw. It addresses the critical need for uptime by managing a standby VPS that monitors your primary instance. If your primary OpenClaw node becomes unreachable, the health monitor triggers an automated promotion sequence, causing the standby node to take over communication responsibilities. This design prevents data loss and service downtime by ensuring that at least one instance of your agent remains operational at all times.
Installation
- Provision a secondary, lightweight VPS and install the OpenClaw environment.
- Configure Tailscale or a similar VPN to allow encrypted communication between your primary and standby nodes.
- Run
clawhub install openclaw/skills/skills/ember-claw/failover-gateway-pubon your standby machine. - Initialize your workspace repository using Git to ensure both nodes stay synced.
- Modify your standby configuration to only enable specific secondary channels to avoid conflicts.
- Deploy the included systemd services to initiate the health monitor, which polls the primary node every 30 seconds.
- Test by manually stopping the primary service to observe the standby promotion process.
Use Cases
- Mission-Critical Operations: Ideal for users managing automated trading or long-running tasks that cannot afford extended downtime.
- Geographical Redundancy: Deploying nodes in different regions to mitigate localized data center outages.
- Disaster Recovery: Creating a clean, minimal-resource recovery point that avoids the complexity of load balancing by using a channel-splitting strategy.
Example Prompts
- "OpenClaw, verify the current health status of my primary node and report if the failover-gateway is actively monitoring."
- "Update my failover-gateway configuration to prioritize Discord notifications as the secondary channel during a primary outage."
- "Show me the last timestamp when the standby node successfully polled the primary heartbeat."
Tips & Limitations
- Channel Splitting: This is the most critical component. By ensuring the primary and standby own different channels, you eliminate split-brain issues without complex database synchronization.
- Resource Allocation: You can save costs by running a smaller VPS for the standby, as it only needs enough power to handle essential recovery tasks.
- Limitations: This skill does not synchronize memory state between nodes. If a task is mid-execution during a failover, it may not resume perfectly from the exact second of failure unless your workflow is idempotent.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-ember-claw-failover-gateway-pub": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, code-execution