ClawKit LogoClawKit

Agent Fundamentals

An AI agent is more than a chatbot. It perceives its environment, reasons about goals, and acts on the world. This page explains the core concepts behind autonomous agents.

The Agent Loop

Every agent — from simple scripts to OpenClaw — follows the same fundamental cycle:

Perceive
Observe the environment
Reason
Decide next action
Act
Execute the action
Evaluate
Check the result

This loop repeats until the goal is achieved or the agent runs out of steps.

The Three Pillars

Modern agent architectures rest on three capabilities. OpenClaw implements all three:

Memory

Agents need context to make good decisions. Short-term memory is the conversation history within a session. Long-term memory persists across sessions using vector databases or file storage.

OpenClaw uses in-context history (short-term) and plans local vector DB support for v3 (long-term).

Planning

Given a complex goal, the agent must break it into achievable sub-tasks. This can be done upfront (plan-then-execute) or reactively (decide one step at a time based on the current state).

OpenClaw uses reactive planning by default, with optional goal decomposition for complex tasks.

Tool Use

An LLM alone can only generate text. Tool use gives it the ability to interact with the world — clicking buttons, reading files, calling APIs. This is what transforms a chatbot into an agent.

OpenClaw's entire Skill System is built around tool use via MCP.

Reactive vs. Deliberative Agents

There are two main schools of thought on how agents should operate:

Reactive (OpenClaw Default)

The agent observes the current state and decides the very next action. No upfront plan. Highly adaptable to dynamic environments like web pages.

+ Handles unexpected changes well
+ Lower latency per step
- Can get stuck in loops
- Less efficient for long tasks

Deliberative (Plan-Then-Execute)

The agent creates a full plan first, then follows it step by step. More efficient but brittle when the environment changes mid-execution.

+ More efficient for known tasks
+ Easier to debug and review
- Breaks when environment changes
- Higher upfront cost

Why Context Length Matters

Every step the agent takes adds to its conversation history. After 20-30 steps, the context can grow to 50K+ tokens. This matters because:

  • Cost: You pay per token. Longer context = higher cost per step. This compounds exponentially.
  • Quality: LLMs perform worse with very long contexts. The agent may "forget" earlier observations.
  • Speed: Longer prompts take more time to process, slowing down the agent loop.

This is why choosing a cost-effective model with good caching (like DeepSeek V3.2) is critical for agent workloads. See our cost comparison for details.

Key Insight

The best agents are not the ones with the most powerful LLM — they're the ones with the best observation compression. OpenClaw's DOM-to-text compressor reduces a 500KB webpage to ~2KB of relevant signal.

Need Help?

Try our automated tools to solve common issues instantly.