Why does Gateway timeout after reboot?

The LaunchAgent or systemd service starts the gateway process, but a stale WebSocket state file or port lock from the previous session prevents the new health-check handshake from completing.

What is the confirmed recovery action?

Kickstart the gateway service (macOS) or restart the systemd unit (Linux), clear any stale lock files, then validate with openclaw doctor.

Is this permanently fixed upstream?

No confirmed permanent fix yet. Issue #20958 remains open, so treat this as an operational recovery runbook.

How do I ensure recovery is complete?

Run openclaw doctor, check gateway status reports healthy, confirm logs show no repeated restart/timeout lines for at least 2 minutes.

Gateway WebSocket Timeout After Reboot

Known Open Issue

This behavior is tracked in openclaw/openclaw#20958. The Gateway starts after reboot but the WebSocket health-check handshake never completes — the UI shows repeated reconnect attempts and the status oscillates between "connecting" and "timeout".

After a machine reboot, the Gateway process starts normally (you can see it in ps aux or launchctl list), but clients cannot establish a stable WebSocket connection. The health endpoint may respond but the handshake never upgrades to a persistent channel.

Next Step

Fix now, then reduce repeat incidents

If this issue keeps coming back, validate your setup in Doctor first, then harden your config.

Open Doctor Harden Config

Jump to Section

What the Error Looks Like Why This Happens Fix A: macOS (LaunchAgent) Fix B: Linux (systemd) Fix C: Docker Verify Recovery Prevent Recurrence Still Stuck

What the Error Looks Like

You'll see one or more of these in Gateway logs or the Control UI after a reboot:

gateway health check timeout after 10000ms

websocket handshake failed — retrying in 5000ms

connection upgraded but channel not established

[gateway] status: oscillating (healthy → timeout → healthy)

launchctl kickstart: service already running but not responding

The key clue is that the Gateway process is running (you can verify with ps aux | grep openclaw) but WebSocket connections never stabilize. This distinguishes it from a process crash (where the process is missing) or a port conflict (where you'd see EADDRINUSE).

Why This Happens

Stale WebSocket state file

The Gateway writes connection state to a file during normal operation. On unclean shutdown (power loss, kernel panic, force restart), this file is not cleaned up. When the Gateway starts again, it reads the stale state and tries to resume connections that no longer exist — causing health-check loops.

Lock file from previous PID

The Gateway uses a PID lock file to prevent duplicate instances. After reboot, the old PID is invalid but the lock file persists. The new process detects the lock, enters a degraded mode where it starts but doesn't fully initialize the WebSocket listener.

LaunchAgent timing race

On macOS, the LaunchAgent may start the Gateway before the network stack is fully ready. The initial WebSocket bind succeeds on localhost but external connections fail until the network interface is up. By then, the Gateway is stuck in a retry loop.

Fix A: macOS (LaunchAgent)

This is the most common scenario. The LaunchAgent starts the Gateway but the WebSocket channel is stuck.

Step 1 — Force-restart the LaunchAgent

launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway

Step 2 — Clear stale state files

rm -f ~/.openclaw/gateway/.lock
rm -f ~/.openclaw/gateway/ws-state.json

Step 3 — Verify recovery

openclaw gateway status
openclaw doctor

The -k flag in launchctl kickstart kills the existing process before restarting — this is critical. Without it, launchctl sees the process is "running" and does nothing.

Fix B: Linux (systemd)

Step 1 — Restart the systemd service

sudo systemctl restart openclaw-gateway

Step 2 — Clear stale state files

rm -f ~/.openclaw/gateway/.lock
rm -f ~/.openclaw/gateway/ws-state.json

Step 3 — Verify

systemctl status openclaw-gateway
openclaw gateway status
openclaw doctor

If systemctl restart reports "service not found," the unit file was never installed. Run openclaw gateway install first, then sudo systemctl enable --now openclaw-gateway.

Fix C: Docker

Docker containers don't survive reboots unless you set a restart policy. If your container stopped on reboot, that's the cause — not a WebSocket bug.

Check and restart container

# Check container status
docker ps -a | grep openclaw

# If status is "Exited", restart
docker compose up -d openclaw-gateway

# Add restart policy to prevent this
# In docker-compose.yml:
# services:
#   openclaw-gateway:
#     restart: unless-stopped

If the container is running but WebSocket still times out, exec into it and clear state:

Clear state inside container

docker exec openclaw-gateway rm -f /app/.openclaw/gateway/.lock
docker compose restart openclaw-gateway

Verify Recovery

After running the fix for your platform, verify all three checks pass:

Gateway status reports healthy

openclaw gateway status

Expected: Status: running, Port: 18789, Health: ok

No timeout lines in recent logs

openclaw logs --tail 50 | grep -i timeout

Expected: No output (empty = good)

Client reconnects and stays connected

Expected: Open Control UI — connection indicator should turn green and stay green for 2+ minutes

Full verification sequence

openclaw doctor
openclaw gateway status
openclaw logs --tail 100 | grep -Ei "timeout|reconnect|restart|health"

Prevent Recurrence

Add a post-boot health check script

Create a script that runs 30 seconds after boot, checks openclaw gateway status, and runs launchctl kickstart -k if health is not "ok". On macOS, add it as a separate LaunchAgent with a RunAtLoad + StartInterval.

Enable automatic lock cleanup

Add rm -f ~/.openclaw/gateway/.lock to your gateway start script or systemd ExecStartPre directive. This ensures stale locks from unclean shutdowns are always cleared before the gateway process starts.

Pin to a single process manager

Don't mix launchctl, systemd, and manual openclaw gateway start. Pick one and stick with it. Mixing process managers creates competing lock files and PID conflicts that cause exactly this issue.

Monitor upstream issue #20958

When this issue is closed with a fixed version, update and remove the workaround scripts. The permanent fix will likely include automatic stale-state cleanup on startup.

Keep this page in workaround mode until issue #20958 is closed with a specific fixed version.

Still Stuck?

Stream logs in real-time while attempting to connect from the Control UI:

Live debug

openclaw logs --follow | grep -Ei 'websocket|health|timeout|handshake|reconnect'

Run the Doctor

npx clawkit-doctor@latest

Checks gateway health, port availability, lock file state, and LaunchAgent registration.

Fix It Faster With Our Tools

Config Wizard

Generate a production-ready clawhub.json in 30 seconds.

Local Doctor

Diagnose Node.js, permissions, and config issues instantly.

Cost Simulator

Calculate your agent burn rate before you get surprised.

Skill Finder

Describe your use case and find the right Claude Code skill instantly.

Did this guide solve your problem?