ClawKit Logo
ClawKitReliability Toolkit

Production Agent Patterns

Community-validated patterns

These 6 patterns were extracted from real production deployments shared in the OpenClaw community. They address the failure modes that appear after the first 48 hours of autonomous operation — not setup, but sustained reliability.

Running an agent for 15 minutes is easy. Running one reliably for days — with crash recovery, cost control, and no goal drift — requires specific structural choices. These 6 patterns are the result of real community production experience.

1. The 5-File Memory Split

The default pattern of keeping everything in a single CLAUDE.md file works fine for short tasks but degrades badly over hours of autonomous work. When that file exceeds ~50 KB the model starts ignoring sections in the middle, and when it exceeds 200 KB you get outright context confusion. The fix is splitting memory into five purpose-specific files so the agent loads only what it needs per phase.

Recommended workspace memory layout
~/.openclaw/workspace/
ā”œā”€ā”€ SOUL.md               # Agent identity + startup instructions (always loaded)
ā”œā”€ā”€ active-tasks.md       # Current work queue — read FIRST on every restart
ā”œā”€ā”€ mistakes.md           # Failures and lessons — never repeat these
ā”œā”€ā”€ self-critique.md      # Periodic self-review log — reviewed every 4h
ā”œā”€ā”€ project-state.md      # Long-lived project facts, decisions, architecture
└── daily-logs/
    ā”œā”€ā”€ 2026-01-15.md
    ā”œā”€ā”€ 2026-01-16.md
    └── ...               # 7-day rotation; older files auto-archived
active-tasks.md starter template
# Active Tasks
<!-- Agent: read this file FIRST on every session start -->
<!-- Update task status inline; never delete completed rows, mark DONE -->

## Current Sprint
| ID  | Task                          | Status      | Started     | Blocked By |
|-----|-------------------------------|-------------|-------------|------------|
| T01 | Draft weekly newsletter       | IN PROGRESS | 2026-01-15  | —          |
| T02 | Scrape competitor pricing     | QUEUED      | —           | T01        |

## Done This Week
| ID  | Task                          | Completed   |
|-----|-------------------------------|-------------|
| T00 | Set up RSS fetch pipeline     | 2026-01-14  |

Daily log rotation matters

Configure daily-logs/ to archive entries older than 7 days to cold storage (e.g., ~/.openclaw/archive/). Leaving 30+ days of logs in the active workspace causes the same context bloat problem the split was designed to prevent.

2. Skill Descriptions with "Use When / Don't Use When"

When an agent has more than five skills registered, wrong-tool invocation becomes a real cost and correctness problem. The agent calls a slow, expensive web-search skill when a fast local-file skill would do, or invokes a destructive database write when a read was intended. Embedding explicit USE WHEN and DON'T USE WHEN guards in each skill's description cuts mis-invocations by roughly half in multi-skill deployments.

Skill definition with USE WHEN guards
name: web_search
description: |
  Searches the public web for current information.

  USE WHEN:
    - The query requires facts published after your training cutoff
    - You need live prices, recent news, or current documentation
    - No local cached version of the data exists

  DON'T USE WHEN:
    - The information is already in project-state.md or daily-logs/
    - The task only needs file manipulation or code edits
    - You are inside a sandboxed session with no outbound access

  COST: ~$0.003 per call (external model, slow). Prefer local skills first.

entrypoint: skills/web_search.py
model: claude-opus-4-6   # strongest model — handles untrusted web content

Tip: Add a COST line to every skill description. The agent uses this signal to prefer cheaper alternatives when multiple skills could satisfy a request — without needing special routing logic.

3. Heartbeat Checklist — Every 30 Minutes, Under 20 Lines

Goal drift is subtle: the agent does not crash, it just starts optimizing for something slightly different from what you intended. After 2–3 hours of autonomous work without a checkpoint, agents routinely spend effort on self-assigned sub-tasks that were never in scope. A short heartbeat routine — injected as a cron skill or a timed prompt — forces re-alignment without interrupting flow. Keep it under 20 lines so the agent actually runs it fully rather than skimming.

Heartbeat checklist injected every 30 min
# HEARTBEAT — run every 30 minutes
# Stop current work, answer each question, then resume.

CHECKLIST:
[ ] 1. Re-read active-tasks.md top 3 rows. Am I still working on T01?
[ ] 2. Is any task marked IN PROGRESS for more than 2 hours? -> log blocker and escalate.
[ ] 3. Has any session file grown past 2 MB? -> summarize and archive it now.
[ ] 4. Is it time for my 4-hour self-review? (check last entry in self-critique.md)
[ ] 5. Did I make any mistake in the last 30 min? -> log it in mistakes.md before continuing.
[ ] 6. What is the single most important next action? Write it as the first line of active-tasks.md.

DONE -> resume work. Total time budget for this checklist: 90 seconds.
Cron entry for heartbeat
name: heartbeat
schedule: "*/30 * * * *"   # every 30 minutes
session: isolated           # own session — does NOT share context with main agent
prompt_file: prompts/heartbeat-checklist.md
model: claude-haiku-4-5    # fast, cheap; this is internal bookkeeping
write_output_to: daily-logs/heartbeat.log

Do not skip the 4-hour self-review. The heartbeat every 30 min catches tactical drift. The self-critique every 4 hours catches strategic drift — the agent pursuing a coherent but wrong objective. Log entries in self-critique.md with timestamps so you can audit the agent's reasoning post-hoc.

4. Isolated Cron Sessions — No Context Bleed

When multiple scheduled jobs share a session, earlier jobs pollute the context of later ones. A 6 AM content scout that fetches 40 articles will leave those articles in context when the 8 AM news summary runs — causing the summary to over-index on whatever the scout found rather than fresh inputs. Each cron job must start a clean, isolated session with only the files it explicitly declares as inputs.

Two isolated cron jobs — no shared session
# content-scout.yaml
name: content_scout
schedule: "0 6 * * *"     # 6:00 AM daily
session: isolated           # fresh session, zero shared context
context_files:
  - project-state.md        # only what this job needs
  - skills/rss-fetch.yaml
prompt: |
  Fetch today's RSS feeds. Score each item 1-10 for relevance.
  Write top 5 items to daily-logs/scout-{date}.md. Nothing else.
model: claude-haiku-4-5
output: daily-logs/scout-{date}.md

---

# news-summary.yaml
name: news_summary
schedule: "0 8 * * *"     # 8:00 AM daily — AFTER scout completes
session: isolated           # own clean session
context_files:
  - daily-logs/scout-{date}.md   # reads scout output as INPUT, not session state
prompt: |
  Read scout-{date}.md. Write a 5-sentence Telegram-ready summary.
  Post via telegram_send skill. Log result to daily-logs/summary-{date}.md.
model: claude-sonnet-4-6   # better writing quality for human-facing output
output: daily-logs/summary-{date}.md

Pass data between jobs through files, not sessions. The scout writes to daily-logs/scout-{date}.md; the summary reads that file as a declared input. This is the correct handoff pattern — it keeps sessions clean and makes the pipeline auditable.

5. Crash Recovery via SOUL.md

Without explicit startup instructions, a restarted agent begins from scratch: it re-introduces itself, re-explores the workspace, and often picks up low-priority work instead of resuming the critical task that was interrupted. SOUL.md is the one file that is always loaded first, before any other context. It contains the agent's identity, its standing orders, and a mandatory startup sequence that forces autonomous resume.

SOUL.md — always-loaded agent identity file
# SOUL.md — Agent Identity and Startup Protocol
# This file is loaded FIRST on every session start, including crash restarts.

## Identity
You are the ContentOps agent for [Project Name].
Your permanent goal: keep the content pipeline running autonomously.
You do not ask for permission for tasks already in active-tasks.md.

## Startup Sequence (MANDATORY — run in this exact order)
1. Read active-tasks.md -> identify the first IN PROGRESS task
2. Read mistakes.md -> scan for any mistake relevant to that task
3. Check self-critique.md -> is a 4h review overdue?
4. Resume IN PROGRESS task immediately, without any preamble

## Standing Orders
- Never leave active-tasks.md without at least one IN PROGRESS row
- If you are unsure what to do, re-read project-state.md before asking a human
- Log every tool call failure to mistakes.md before retrying
- Archive any session file over 2 MB before continuing

## Emergency Recovery
If active-tasks.md is empty or corrupted:
  1. Log "RECOVERY: active-tasks.md missing/empty" to daily-logs/{date}.md
  2. Re-derive top 3 tasks from project-state.md
  3. Add them to active-tasks.md with status QUEUED
  4. Begin the first task

SOUL.md must never be archived or rotated. It is a permanent fixture. Keep it under 60 lines — if it grows longer than that, the startup sequence becomes too slow and the agent starts skimming it. Move any background context to project-state.md.

6. Model Routing by Trust Level

Not all agent tasks deserve the same model. Using a strong model for everything is expensive; using a weak model for everything creates a prompt injection risk when the agent processes untrusted external content. The correct heuristic is: route by trust level, not by task complexity. Untrusted content (web pages, user emails, third-party API responses) always goes through the strongest available model. Internal bookkeeping uses the cheapest fast model. Human-facing output sits in the middle.

Model routing policy (config/routing.yaml)
# Model Routing Policy
# Rule: trust level determines model, not task difficulty

routes:
  # UNTRUSTED EXTERNAL CONTENT — strongest model, prompt injection resistance
  - match:
      source: [web, email, rss, third_party_api, user_upload]
    model: claude-opus-4-6
    rationale: "Untrusted content may contain prompt injection. Strongest model is hardest to hijack."

  # HUMAN-FACING OUTPUT — quality matters
  - match:
      output_channel: [telegram, email_reply, report, blog_post]
    model: claude-sonnet-4-6
    rationale: "Writing quality affects user trust. Mid-tier balances cost and output quality."

  # INTERNAL BOOKKEEPING — fast and cheap
  - match:
      task_type: [heartbeat, log_summary, file_archive, task_update, self_critique]
    model: claude-haiku-4-5
    rationale: "Trusted internal data, no injection risk. Speed and cost matter more than quality."

  # FALLBACK
  - match: "*"
    model: claude-sonnet-4-6
    rationale: "Safe default for unclassified tasks."

Never allow cost-optimization to override trust-level routing. If a budget alert fires and you are tempted to downgrade the web-fetch skill to Haiku to save money — do not. The prompt injection risk on untrusted content is real and the cost of a successful injection far exceeds the model cost difference.

Pattern Cheat Sheet

PatternProblem SolvedKey Config
5-File Memory SplitMemory bloat, context confusionactive-tasks.md, mistakes.md, self-critique.md, project-state.md, daily-logs/
Skill USE/DON'T USE GuardsWrong-tool invocationAdd USE WHEN / DON'T USE WHEN + COST to every skill description
Heartbeat ChecklistGoal drift, stale tasksschedule: "*/30 * * * *", session: isolated, under 20 lines
Isolated Cron SessionsContext bleed between jobssession: isolated, pass data via files not sessions
Crash Recovery (SOUL.md)Restart from scratch after crashSOUL.md always loaded first, mandatory startup sequence
Model Routing by TrustCost waste + prompt injectionexternal → opus, output → sonnet, internal → haiku

Did this guide solve your problem?

Need Help?

Try our automated tools to solve common issues instantly.