Gateway WebSocket Timeout After Reboot
Known Open Issue
This behavior is tracked in openclaw/openclaw#20958. The Gateway starts after reboot but the WebSocket health-check handshake never completes — the UI shows repeated reconnect attempts and the status oscillates between "connecting" and "timeout".
After a machine reboot, the Gateway process starts normally (you can see it in ps aux or launchctl list), but clients cannot establish a stable WebSocket connection. The health endpoint may respond but the handshake never upgrades to a persistent channel.
Next Step
Fix now, then reduce repeat incidents
If this issue keeps coming back, validate your setup in Doctor first, then harden your config.
Jump to Section
What the Error Looks Like
You'll see one or more of these in Gateway logs or the Control UI after a reboot:
The key clue is that the Gateway process is running (you can verify with ps aux | grep openclaw) but WebSocket connections never stabilize. This distinguishes it from a process crash (where the process is missing) or a port conflict (where you'd see EADDRINUSE).
Why This Happens
Stale WebSocket state file
The Gateway writes connection state to a file during normal operation. On unclean shutdown (power loss, kernel panic, force restart), this file is not cleaned up. When the Gateway starts again, it reads the stale state and tries to resume connections that no longer exist — causing health-check loops.
Lock file from previous PID
The Gateway uses a PID lock file to prevent duplicate instances. After reboot, the old PID is invalid but the lock file persists. The new process detects the lock, enters a degraded mode where it starts but doesn't fully initialize the WebSocket listener.
LaunchAgent timing race
On macOS, the LaunchAgent may start the Gateway before the network stack is fully ready. The initial WebSocket bind succeeds on localhost but external connections fail until the network interface is up. By then, the Gateway is stuck in a retry loop.
Fix A: macOS (LaunchAgent)
This is the most common scenario. The LaunchAgent starts the Gateway but the WebSocket channel is stuck.
launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway
rm -f ~/.openclaw/gateway/.lock rm -f ~/.openclaw/gateway/ws-state.json
openclaw gateway status openclaw doctor
The -k flag in launchctl kickstart kills the existing process before restarting — this is critical. Without it, launchctl sees the process is "running" and does nothing.
Fix B: Linux (systemd)
sudo systemctl restart openclaw-gateway
rm -f ~/.openclaw/gateway/.lock rm -f ~/.openclaw/gateway/ws-state.json
systemctl status openclaw-gateway openclaw gateway status openclaw doctor
If systemctl restart reports "service not found," the unit file was never installed. Run openclaw gateway install first, then sudo systemctl enable --now openclaw-gateway.
Fix C: Docker
Docker containers don't survive reboots unless you set a restart policy. If your container stopped on reboot, that's the cause — not a WebSocket bug.
# Check container status docker ps -a | grep openclaw # If status is "Exited", restart docker compose up -d openclaw-gateway # Add restart policy to prevent this # In docker-compose.yml: # services: # openclaw-gateway: # restart: unless-stopped
If the container is running but WebSocket still times out, exec into it and clear state:
docker exec openclaw-gateway rm -f /app/.openclaw/gateway/.lock docker compose restart openclaw-gateway
Verify Recovery
After running the fix for your platform, verify all three checks pass:
Expected: Status: running, Port: 18789, Health: ok
Expected: No output (empty = good)
Expected: Open Control UI — connection indicator should turn green and stay green for 2+ minutes
openclaw doctor openclaw gateway status openclaw logs --tail 100 | grep -Ei "timeout|reconnect|restart|health"
Prevent Recurrence
Add a post-boot health check script
Create a script that runs 30 seconds after boot, checks openclaw gateway status, and runs launchctl kickstart -k if health is not "ok". On macOS, add it as a separate LaunchAgent with a RunAtLoad + StartInterval.
Enable automatic lock cleanup
Add rm -f ~/.openclaw/gateway/.lock to your gateway start script or systemd ExecStartPre directive. This ensures stale locks from unclean shutdowns are always cleared before the gateway process starts.
Pin to a single process manager
Don't mix launchctl, systemd, and manual openclaw gateway start. Pick one and stick with it. Mixing process managers creates competing lock files and PID conflicts that cause exactly this issue.
Monitor upstream issue #20958
When this issue is closed with a fixed version, update and remove the workaround scripts. The permanent fix will likely include automatic stale-state cleanup on startup.
Keep this page in workaround mode until issue #20958 is closed with a specific fixed version.
Still Stuck?
Stream logs in real-time while attempting to connect from the Control UI:
openclaw logs --follow | grep -Ei 'websocket|health|timeout|handshake|reconnect'
Run the Doctor
Checks gateway health, port availability, lock file state, and LaunchAgent registration.
Related Issues
Other gateway connectivity problems:
Fix It Faster With Our Tools
Config Wizard
Generate a production-ready clawhub.json in 30 seconds.
Local Doctor
Diagnose Node.js, permissions, and config issues instantly.
Cost Simulator
Calculate your agent burn rate before you get surprised.
Gateway Monitor
Detect token spikes and gateway incidents before users complain.
Skill Finder
Describe your use case and find the right Claude Code skill instantly.
Did this guide solve your problem?