Why does OpenClaw show thinking forever with a local Ollama model?

On CPU-only systems, large models (7B+) can take 5–10 minutes to produce a response. OpenClaw times out before Ollama finishes. Switch to a smaller model (1B–3B) or increase the requestTimeout in openclaw.json.

Which Ollama models work best on CPU without a GPU?

For CPU-only systems: llama3.2:1b, qwen2.5:1.5b, or phi3:mini are the most usable. 7B models are possible but slow (1–5 minutes per response). 13B+ models are impractical on CPU.

How do I check if Ollama is actually running and responsive?

Run: curl http://localhost:11434/api/tags — if it returns a JSON list of models, Ollama is running. Then test inference: curl http://localhost:11434/api/generate -d '{"model":"llama3.2:1b","prompt":"hi","stream":false}'

Fix: Ollama Local Model Stuck Thinking (CPU / No GPU)

Permanent Thinking — No Error, No Response

OpenClaw shows a spinning "thinking" indicator and never delivers a response. There's no error message — the request reaches Ollama, but Ollama is too slow to respond before the timeout fires.

CPU inference is orders of magnitude slower than GPU. A 7B model that responds in 3 seconds on a GPU can take 5–10 minutes on a modern CPU — long past OpenClaw's default timeout. The fix is either a smaller model or a longer timeout (or both).

Next Step

Fix now, then reduce repeat incidents

If this issue keeps coming back, validate your setup in Doctor first, then harden your config.

Open Doctor Harden Config

Jump to Fix

Fix A: Switch to a Smaller Model Fix B: Increase the Timeout Fix C: Verify Ollama Is Running Fix D: Set Correct Model Name CPU Performance Expectations

What You See

OpenClaw chat: thinking… (spins forever, no error)

openclaw logs: [llm] request sent to ollama — no response yet (60s)

ollama logs: llama_model_load: loading model… (still loading)

Fix A: Switch to a Smaller Model

This is the most impactful change. On CPU-only systems, model size directly controls whether the tool is usable at all:

llama3.2:1b~800 MB5–30s responseGood

qwen2.5:1.5b~1 GB10–45s responseGood

phi3:mini~2.3 GB20–90s responseAcceptable

llama3.2:3b~2 GB30–120s responseSlow

llama3.1:8b~4.7 GB5–15 min responseImpractical

Pull a smaller model and update your config:

Pull a CPU-friendly model

ollama pull llama3.2:1b
# or
ollama pull qwen2.5:1.5b

openclaw.json — Use Smaller Model

{
  "llm": {
    "provider": "ollama",
    "baseUrl": "http://localhost:11434",
    "model": "llama3.2:1b"
  }
}

Fix B: Increase the Timeout

If you want to keep using a larger model and accept the slower speed, increase requestTimeout to give Ollama enough time to respond:

openclaw.json — Long Timeout for CPU

{
  "llm": {
    "provider": "ollama",
    "baseUrl": "http://localhost:11434",
    "model": "llama3.1:8b",
    "requestTimeout": 600000
  }
}

600000 is 10 minutes. This is a last resort — the UX will be very slow. Combining a smaller model with a 120s timeout is a much better user experience.

Fix C: Verify Ollama Is Actually Running

Before tuning timeouts, confirm Ollama is responsive:

Check Ollama status

# Is Ollama running?
curl http://localhost:11434/api/tags

# Test a real inference (watch the timing)
time curl http://localhost:11434/api/generate \
  -d '{"model":"llama3.2:1b","prompt":"say hi","stream":false}'

/api/tags returns JSON list

If this fails, Ollama is not running — start it with: ollama serve

/api/generate responds (even if slow)

This confirms model loading and inference works end-to-end

The model you configured is in the list

Run: ollama list — the exact model name must match openclaw.json

Fix D: Model Name Must Be Exact

If the model name in openclaw.json doesn't match what's pulled in Ollama, the request silently fails. Check the exact name:

List downloaded models

ollama list

Use the exact name from the NAME column — including the tag (e.g. llama3.2:1b, not just llama3.2). If the tag is latest, you can omit it or include it — either works.

CPU Performance Expectations

CPU inference is not fast — set realistic expectations

A 2024 AMD Ryzen 9 CPU can do roughly 10–15 tokens/sec on a 1B model. A typical response is 100–300 tokens, so expect 10–30 seconds per reply. This is fine for occasional use, not great for interactive chat.

Run the Doctor

npx clawkit-doctor@latest

Checks Ollama service status, model availability, and response time.

Fix It Faster With Our Tools

Config Wizard

Generate a production-ready clawhub.json in 30 seconds.

Local Doctor

Diagnose Node.js, permissions, and config issues instantly.

Cost Simulator

Calculate your agent burn rate before you get surprised.

Skill Finder

Describe your use case and find the right Claude Code skill instantly.

Did this guide solve your problem?