OpenClaw Cost Optimization 2026
API costs scale with token volume. This guide shows which config changes have the most impact — model selection, token caps, and switching to local inference — without changing agent behavior.
1-Minute Execution Version
Copy, replace YOUR_KEY, restart gateway.
{
"llm": {
"provider": "deepseek",
"apiKey": "YOUR_KEY",
"model": "deepseek-chat",
"baseURL": "https://api.deepseek.com/v1",
"maxTokens": 4096,
"temperature": 0.7
}
}DeepSeek V3 input: $0.27/M tokens, output: $1.10/M tokens (as of Feb 2026 — verify at DeepSeek Pricing).
Why Costs Spike Unexpectedly
OpenClaw agents are multi-turn by design. Each tool call or sub-agent spawns new completions that accumulate fast. Three patterns account for most unexpected cost spikes:
No maxTokens ceiling
Without a token cap, a single runaway task can generate 128K+ tokens ($1.28 on GPT-4.1 output alone). A cap of 4096 limits worst-case output cost.
Expensive model for every task
Using GPT-4.1 or Sonnet for simple file lookups and summaries wastes 10–50× compared to smaller models. Routing by task type is the highest-leverage change.
Long conversation history replayed
OpenClaw replays the full message history per turn by default. For long sessions, this means paying for the same context on every turn.
Step 1: Pick the Right Model
The model swap is the single highest-impact change. Use our Model Pricing page for current rates. Rough 2026 tiers:
| Model | Best For | Input / Output |
|---|---|---|
| GPT-4.1 | Complex reasoning, long doc analysis | $2 / $8 /M |
| Claude Sonnet 4.6 | Code gen, long context | $3 / $15 /M |
| GPT-4.1 mini | Summaries, Q&A, routing | $0.40 / $1.60 /M |
| DeepSeek V3 | Code gen, most agent tasks | $0.27 / $1.10 /M |
| Ollama (local) | Privacy-sensitive, offline | $0 (compute only) |
Prices approximate. Verify at each provider's pricing page before committing.
Step 2: Add a maxTokens Cap
This is the fastest safety net. Without it, a single agent loop can exhaust a daily budget in minutes.
{
"llm": {
"model": "deepseek-chat",
"maxTokens": 4096
}
}Start at 4096 and increase only if you find agents truncating legitimately. Most single-turn tasks need under 2000 output tokens.
Step 3: Switch to DeepSeek V3
DeepSeek V3 handles most OpenClaw workloads — code generation, task planning, JSON structuring — at significantly lower cost than GPT-4-tier models. It is not a reasoning model (no chain-of-thought), so it's not suited for tasks that explicitly need reasoning_effort.
{
"llm": {
"provider": "deepseek",
"apiKey": "sk-...",
"model": "deepseek-chat",
"baseURL": "https://api.deepseek.com/v1",
"maxTokens": 4096,
"temperature": 0.7
}
}Get an API key at platform.deepseek.com. For a full setup walkthrough, see our DeepSeek Setup Guide.
Step 4: Run Locally with Ollama (Zero API Cost)
For privacy-sensitive tasks or when you want to eliminate API costs entirely, Ollama runs models on your machine. Performance depends on your hardware.
# 1. Pull a model locally
ollama pull qwen2.5:14b
# 2. Update openclaw config
{
"llm": {
"provider": "ollama",
"model": "qwen2.5:14b",
"baseURL": "http://localhost:11434/v1",
"maxTokens": 4096
}
}Ollama Limitations
Ollama timeout errors (30s default) are common with larger models on modest hardware. See our Ollama Timeout troubleshooting guide if you hit this.
Generate a Config Automatically
Tell the Config Wizard your provider and budget. It generates a complete openclaw.json with correct fields, token limits, and baseURL.
Step 5: Track Usage Before Optimizing Further
Without usage data, optimization is guesswork. Each provider has built-in usage dashboards:
Set a monthly spend alert in your provider's billing settings. Most let you trigger an email at a fixed dollar threshold.
Quick Reference
Swap to DeepSeek V3 or GPT-4.1 mini
Highest impact — 60–90% cost reduction for most tasks
Set maxTokens: 4096
Prevents runaway single-task cost
Use Ollama for local-only tasks
Zero API cost; limited by hardware
Set provider spend alerts
Catches unexpected cost spikes early
Related Guides
Other pages that affect cost:
Other Tools
Config Wizard
Generate a production-ready clawhub.json in 30 seconds.
Local Doctor
Diagnose Node.js, permissions, and config issues instantly.
Cost Simulator
Calculate your agent burn rate before you get surprised.
Gateway Monitor
Detect token spikes and gateway incidents before users complain.
Skill Finder
Describe your use case and find the right Claude Code skill instantly.
Did this guide solve your problem?