OpenClaw Cost Optimization 2026

API costs scale with token volume. This guide shows which config changes have the most impact — model selection, token caps, and switching to local inference — without changing agent behavior.

1-Minute Execution Version

Copy, replace YOUR_KEY, restart gateway.

Minimal Cost Config (DeepSeek)

{
  "llm": {
    "provider": "deepseek",
    "apiKey": "YOUR_KEY",
    "model": "deepseek-chat",
    "baseURL": "https://api.deepseek.com/v1",
    "maxTokens": 4096,
    "temperature": 0.7
  }
}

DeepSeek V3 input: $0.27/M tokens, output: $1.10/M tokens (as of Feb 2026 — verify at DeepSeek Pricing).

Why Costs Spike Unexpectedly

OpenClaw agents are multi-turn by design. Each tool call or sub-agent spawns new completions that accumulate fast. Three patterns account for most unexpected cost spikes:

No maxTokens ceiling

Without a token cap, a single runaway task can generate 128K+ tokens ($1.28 on GPT-4.1 output alone). A cap of 4096 limits worst-case output cost.

Expensive model for every task

Using GPT-4.1 or Sonnet for simple file lookups and summaries wastes 10–50× compared to smaller models. Routing by task type is the highest-leverage change.

Long conversation history replayed

OpenClaw replays the full message history per turn by default. For long sessions, this means paying for the same context on every turn.

Step 1: Pick the Right Model

The model swap is the single highest-impact change. Use our Model Pricing page for current rates. Rough 2026 tiers:

Model	Best For	Input / Output
GPT-4.1	Complex reasoning, long doc analysis	$2 / $8 /M
Claude Sonnet 4.6	Code gen, long context	$3 / $15 /M
GPT-4.1 mini	Summaries, Q&A, routing	$0.40 / $1.60 /M
DeepSeek V3	Code gen, most agent tasks	$0.27 / $1.10 /M
Ollama (local)	Privacy-sensitive, offline	$0 (compute only)

Prices approximate. Verify at each provider's pricing page before committing.

Step 2: Add a maxTokens Cap

This is the fastest safety net. Without it, a single agent loop can exhaust a daily budget in minutes.

Safe maxTokens Config

{
  "llm": {
    "model": "deepseek-chat",
    "maxTokens": 4096
  }
}

Start at 4096 and increase only if you find agents truncating legitimately. Most single-turn tasks need under 2000 output tokens.

Step 3: Switch to DeepSeek V3

DeepSeek V3 handles most OpenClaw workloads — code generation, task planning, JSON structuring — at significantly lower cost than GPT-4-tier models. It is not a reasoning model (no chain-of-thought), so it's not suited for tasks that explicitly need reasoning_effort.

DeepSeek V3 Full Config

{
  "llm": {
    "provider": "deepseek",
    "apiKey": "sk-...",
    "model": "deepseek-chat",
    "baseURL": "https://api.deepseek.com/v1",
    "maxTokens": 4096,
    "temperature": 0.7
  }
}

Get an API key at platform.deepseek.com. For a full setup walkthrough, see our DeepSeek Setup Guide.

Step 4: Run Locally with Ollama (Zero API Cost)

For privacy-sensitive tasks or when you want to eliminate API costs entirely, Ollama runs models on your machine. Performance depends on your hardware.

Ollama Config

# 1. Pull a model locally
ollama pull qwen2.5:14b

# 2. Update openclaw config
{
  "llm": {
    "provider": "ollama",
    "model": "qwen2.5:14b",
    "baseURL": "http://localhost:11434/v1",
    "maxTokens": 4096
  }
}

Ollama Limitations

Ollama timeout errors (30s default) are common with larger models on modest hardware. See our Ollama Timeout troubleshooting guide if you hit this.

Generate a Config Automatically

Tell the Config Wizard your provider and budget. It generates a complete openclaw.json with correct fields, token limits, and baseURL.

Open Config Wizard

Step 5: Track Usage Before Optimizing Further

Without usage data, optimization is guesswork. Each provider has built-in usage dashboards:

OpenAI:platform.openai.com/usage

DeepSeek:platform.deepseek.com

Anthropic:console.anthropic.com

Set a monthly spend alert in your provider's billing settings. Most let you trigger an email at a fixed dollar threshold.

Quick Reference

Swap to DeepSeek V3 or GPT-4.1 mini

Highest impact — 60–90% cost reduction for most tasks

Set maxTokens: 4096

Prevents runaway single-task cost

Use Ollama for local-only tasks

Zero API cost; limited by hardware

Set provider spend alerts

Catches unexpected cost spikes early

Related Guides

Other pages that affect cost:

Other Tools

Config Wizard

Generate a production-ready clawhub.json in 30 seconds.

Local Doctor

Diagnose Node.js, permissions, and config issues instantly.

Cost Simulator

Calculate your agent burn rate before you get surprised.

Skill Finder

Describe your use case and find the right Claude Code skill instantly.

Did this guide solve your problem?