Fix: Ollama Local Model Stuck Thinking (CPU / No GPU)
Permanent Thinking β No Error, No Response
OpenClaw shows a spinning "thinking" indicator and never delivers a response. There's no error message β the request reaches Ollama, but Ollama is too slow to respond before the timeout fires.
CPU inference is orders of magnitude slower than GPU. A 7B model that responds in 3 seconds on a GPU can take 5β10 minutes on a modern CPU β long past OpenClaw's default timeout. The fix is either a smaller model or a longer timeout (or both).
Next Step
Fix now, then reduce repeat incidents
If this issue keeps coming back, validate your setup in Doctor first, then harden your config.
Jump to Fix
What You See
Fix A: Switch to a Smaller Model
This is the most impactful change. On CPU-only systems, model size directly controls whether the tool is usable at all:
Pull a smaller model and update your config:
ollama pull llama3.2:1b # or ollama pull qwen2.5:1.5b
{
"llm": {
"provider": "ollama",
"baseUrl": "http://localhost:11434",
"model": "llama3.2:1b"
}
}Fix B: Increase the Timeout
If you want to keep using a larger model and accept the slower speed, increase requestTimeout to give Ollama enough time to respond:
{
"llm": {
"provider": "ollama",
"baseUrl": "http://localhost:11434",
"model": "llama3.1:8b",
"requestTimeout": 600000
}
}600000 is 10 minutes. This is a last resort β the UX will be very slow. Combining a smaller model with a 120s timeout is a much better user experience.
Fix C: Verify Ollama Is Actually Running
Before tuning timeouts, confirm Ollama is responsive:
# Is Ollama running?
curl http://localhost:11434/api/tags
# Test a real inference (watch the timing)
time curl http://localhost:11434/api/generate \
-d '{"model":"llama3.2:1b","prompt":"say hi","stream":false}'/api/tags returns JSON list
If this fails, Ollama is not running β start it with: ollama serve
/api/generate responds (even if slow)
This confirms model loading and inference works end-to-end
The model you configured is in the list
Run: ollama list β the exact model name must match openclaw.json
Fix D: Model Name Must Be Exact
If the model name in openclaw.json doesn't match what's pulled in Ollama, the request silently fails. Check the exact name:
ollama list
Use the exact name from the NAME column β including the tag (e.g. llama3.2:1b, not just llama3.2). If the tag is latest, you can omit it or include it β either works.
CPU Performance Expectations
CPU inference is not fast β set realistic expectations
A 2024 AMD Ryzen 9 CPU can do roughly 10β15 tokens/sec on a 1B model. A typical response is 100β300 tokens, so expect 10β30 seconds per reply. This is fine for occasional use, not great for interactive chat.
Run the Doctor
Checks Ollama service status, model availability, and response time.
Related Issues
Other Ollama and timeout problems:
Fix It Faster With Our Tools
Did this guide solve your problem?