ClawKit Logo
ClawKitReliability Toolkit

OpenClaw Cost Optimization 2026

API costs scale with token volume. This guide shows which config changes have the most impact — model selection, token caps, and switching to local inference — without changing agent behavior.

1-Minute Execution Version

Copy, replace YOUR_KEY, restart gateway.

Minimal Cost Config (DeepSeek)
{
  "llm": {
    "provider": "deepseek",
    "apiKey": "YOUR_KEY",
    "model": "deepseek-chat",
    "baseURL": "https://api.deepseek.com/v1",
    "maxTokens": 4096,
    "temperature": 0.7
  }
}

DeepSeek V3 input: $0.27/M tokens, output: $1.10/M tokens (as of Feb 2026 — verify at DeepSeek Pricing).

Why Costs Spike Unexpectedly

OpenClaw agents are multi-turn by design. Each tool call or sub-agent spawns new completions that accumulate fast. Three patterns account for most unexpected cost spikes:

No maxTokens ceiling

Without a token cap, a single runaway task can generate 128K+ tokens ($1.28 on GPT-4.1 output alone). A cap of 4096 limits worst-case output cost.

Expensive model for every task

Using GPT-4.1 or Sonnet for simple file lookups and summaries wastes 10–50× compared to smaller models. Routing by task type is the highest-leverage change.

Long conversation history replayed

OpenClaw replays the full message history per turn by default. For long sessions, this means paying for the same context on every turn.

Step 1: Pick the Right Model

The model swap is the single highest-impact change. Use our Model Pricing page for current rates. Rough 2026 tiers:

ModelBest ForInput / Output
GPT-4.1Complex reasoning, long doc analysis$2 / $8 /M
Claude Sonnet 4.6Code gen, long context$3 / $15 /M
GPT-4.1 miniSummaries, Q&A, routing$0.40 / $1.60 /M
DeepSeek V3Code gen, most agent tasks$0.27 / $1.10 /M
Ollama (local)Privacy-sensitive, offline$0 (compute only)

Prices approximate. Verify at each provider's pricing page before committing.

Step 2: Add a maxTokens Cap

This is the fastest safety net. Without it, a single agent loop can exhaust a daily budget in minutes.

Safe maxTokens Config
{
  "llm": {
    "model": "deepseek-chat",
    "maxTokens": 4096
  }
}

Start at 4096 and increase only if you find agents truncating legitimately. Most single-turn tasks need under 2000 output tokens.

Step 3: Switch to DeepSeek V3

DeepSeek V3 handles most OpenClaw workloads — code generation, task planning, JSON structuring — at significantly lower cost than GPT-4-tier models. It is not a reasoning model (no chain-of-thought), so it's not suited for tasks that explicitly need reasoning_effort.

DeepSeek V3 Full Config
{
  "llm": {
    "provider": "deepseek",
    "apiKey": "sk-...",
    "model": "deepseek-chat",
    "baseURL": "https://api.deepseek.com/v1",
    "maxTokens": 4096,
    "temperature": 0.7
  }
}

Get an API key at platform.deepseek.com. For a full setup walkthrough, see our DeepSeek Setup Guide.

Step 4: Run Locally with Ollama (Zero API Cost)

For privacy-sensitive tasks or when you want to eliminate API costs entirely, Ollama runs models on your machine. Performance depends on your hardware.

Ollama Config
# 1. Pull a model locally
ollama pull qwen2.5:14b

# 2. Update openclaw config
{
  "llm": {
    "provider": "ollama",
    "model": "qwen2.5:14b",
    "baseURL": "http://localhost:11434/v1",
    "maxTokens": 4096
  }
}

Ollama Limitations

Ollama timeout errors (30s default) are common with larger models on modest hardware. See our Ollama Timeout troubleshooting guide if you hit this.

Generate a Config Automatically

Tell the Config Wizard your provider and budget. It generates a complete openclaw.json with correct fields, token limits, and baseURL.

Open Config Wizard

Step 5: Track Usage Before Optimizing Further

Without usage data, optimization is guesswork. Each provider has built-in usage dashboards:

OpenAI:platform.openai.com/usage
DeepSeek:platform.deepseek.com
Anthropic:console.anthropic.com

Set a monthly spend alert in your provider's billing settings. Most let you trigger an email at a fixed dollar threshold.

Quick Reference

1

Swap to DeepSeek V3 or GPT-4.1 mini

Highest impact — 60–90% cost reduction for most tasks

2

Set maxTokens: 4096

Prevents runaway single-task cost

3

Use Ollama for local-only tasks

Zero API cost; limited by hardware

4

Set provider spend alerts

Catches unexpected cost spikes early

Did this guide solve your problem?

Need Help?

Try our automated tools to solve common issues instantly.