ClawKit Logo
ClawKitReliability Toolkit

Gateway WebSocket Timeout After Reboot

Known Open Issue

This behavior is tracked in openclaw/openclaw#20958. The Gateway starts after reboot but the WebSocket health-check handshake never completes — the UI shows repeated reconnect attempts and the status oscillates between "connecting" and "timeout".

After a machine reboot, the Gateway process starts normally (you can see it in ps aux or launchctl list), but clients cannot establish a stable WebSocket connection. The health endpoint may respond but the handshake never upgrades to a persistent channel.

Next Step

Fix now, then reduce repeat incidents

If this issue keeps coming back, validate your setup in Doctor first, then harden your config.

What the Error Looks Like

You'll see one or more of these in Gateway logs or the Control UI after a reboot:

gateway health check timeout after 10000ms
websocket handshake failed — retrying in 5000ms
connection upgraded but channel not established
[gateway] status: oscillating (healthy → timeout → healthy)
launchctl kickstart: service already running but not responding

The key clue is that the Gateway process is running (you can verify with ps aux | grep openclaw) but WebSocket connections never stabilize. This distinguishes it from a process crash (where the process is missing) or a port conflict (where you'd see EADDRINUSE).

Why This Happens

1

Stale WebSocket state file

The Gateway writes connection state to a file during normal operation. On unclean shutdown (power loss, kernel panic, force restart), this file is not cleaned up. When the Gateway starts again, it reads the stale state and tries to resume connections that no longer exist — causing health-check loops.

2

Lock file from previous PID

The Gateway uses a PID lock file to prevent duplicate instances. After reboot, the old PID is invalid but the lock file persists. The new process detects the lock, enters a degraded mode where it starts but doesn't fully initialize the WebSocket listener.

3

LaunchAgent timing race

On macOS, the LaunchAgent may start the Gateway before the network stack is fully ready. The initial WebSocket bind succeeds on localhost but external connections fail until the network interface is up. By then, the Gateway is stuck in a retry loop.

Fix A: macOS (LaunchAgent)

This is the most common scenario. The LaunchAgent starts the Gateway but the WebSocket channel is stuck.

Step 1 — Force-restart the LaunchAgent
launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway
Step 2 — Clear stale state files
rm -f ~/.openclaw/gateway/.lock
rm -f ~/.openclaw/gateway/ws-state.json
Step 3 — Verify recovery
openclaw gateway status
openclaw doctor

The -k flag in launchctl kickstart kills the existing process before restarting — this is critical. Without it, launchctl sees the process is "running" and does nothing.

Fix B: Linux (systemd)

Step 1 — Restart the systemd service
sudo systemctl restart openclaw-gateway
Step 2 — Clear stale state files
rm -f ~/.openclaw/gateway/.lock
rm -f ~/.openclaw/gateway/ws-state.json
Step 3 — Verify
systemctl status openclaw-gateway
openclaw gateway status
openclaw doctor

If systemctl restart reports "service not found," the unit file was never installed. Run openclaw gateway install first, then sudo systemctl enable --now openclaw-gateway.

Fix C: Docker

Docker containers don't survive reboots unless you set a restart policy. If your container stopped on reboot, that's the cause — not a WebSocket bug.

Check and restart container
# Check container status
docker ps -a | grep openclaw

# If status is "Exited", restart
docker compose up -d openclaw-gateway

# Add restart policy to prevent this
# In docker-compose.yml:
# services:
#   openclaw-gateway:
#     restart: unless-stopped

If the container is running but WebSocket still times out, exec into it and clear state:

Clear state inside container
docker exec openclaw-gateway rm -f /app/.openclaw/gateway/.lock
docker compose restart openclaw-gateway

Verify Recovery

After running the fix for your platform, verify all three checks pass:

Gateway status reports healthy
openclaw gateway status

Expected: Status: running, Port: 18789, Health: ok

No timeout lines in recent logs
openclaw logs --tail 50 | grep -i timeout

Expected: No output (empty = good)

Client reconnects and stays connected

Expected: Open Control UI — connection indicator should turn green and stay green for 2+ minutes

Full verification sequence
openclaw doctor
openclaw gateway status
openclaw logs --tail 100 | grep -Ei "timeout|reconnect|restart|health"

Prevent Recurrence

1

Add a post-boot health check script

Create a script that runs 30 seconds after boot, checks openclaw gateway status, and runs launchctl kickstart -k if health is not "ok". On macOS, add it as a separate LaunchAgent with a RunAtLoad + StartInterval.

2

Enable automatic lock cleanup

Add rm -f ~/.openclaw/gateway/.lock to your gateway start script or systemd ExecStartPre directive. This ensures stale locks from unclean shutdowns are always cleared before the gateway process starts.

3

Pin to a single process manager

Don't mix launchctl, systemd, and manual openclaw gateway start. Pick one and stick with it. Mixing process managers creates competing lock files and PID conflicts that cause exactly this issue.

4

Monitor upstream issue #20958

When this issue is closed with a fixed version, update and remove the workaround scripts. The permanent fix will likely include automatic stale-state cleanup on startup.

Keep this page in workaround mode until issue #20958 is closed with a specific fixed version.

Still Stuck?

Stream logs in real-time while attempting to connect from the Control UI:

Live debug
openclaw logs --follow | grep -Ei 'websocket|health|timeout|handshake|reconnect'

Run the Doctor

npx clawkit-doctor@latest

Checks gateway health, port availability, lock file state, and LaunchAgent registration.

Did this guide solve your problem?