Agent Fundamentals
An AI agent is more than a chatbot. It perceives its environment, reasons about goals, and acts on the world. This page explains the core concepts behind autonomous agents.
The Agent Loop
Every agent — from simple scripts to OpenClaw — follows the same fundamental cycle:
This loop repeats until the goal is achieved or the agent runs out of steps.
The Three Pillars
Modern agent architectures rest on three capabilities. OpenClaw implements all three:
Memory
Agents need context to make good decisions. Short-term memory is the conversation history within a session. Long-term memory persists across sessions using vector databases or file storage.
OpenClaw uses in-context history (short-term) and plans local vector DB support for v3 (long-term).
Planning
Given a complex goal, the agent must break it into achievable sub-tasks. This can be done upfront (plan-then-execute) or reactively (decide one step at a time based on the current state).
OpenClaw uses reactive planning by default, with optional goal decomposition for complex tasks.
Tool Use
An LLM alone can only generate text. Tool use gives it the ability to interact with the world — clicking buttons, reading files, calling APIs. This is what transforms a chatbot into an agent.
OpenClaw's entire Skill System is built around tool use via MCP.
Reactive vs. Deliberative Agents
There are two main schools of thought on how agents should operate:
Reactive (OpenClaw Default)
The agent observes the current state and decides the very next action. No upfront plan. Highly adaptable to dynamic environments like web pages.
Deliberative (Plan-Then-Execute)
The agent creates a full plan first, then follows it step by step. More efficient but brittle when the environment changes mid-execution.
Why Context Length Matters
Every step the agent takes adds to its conversation history. After 20-30 steps, the context can grow to 50K+ tokens. This matters because:
- Cost: You pay per token. Longer context = higher cost per step. This compounds exponentially.
- Quality: LLMs perform worse with very long contexts. The agent may "forget" earlier observations.
- Speed: Longer prompts take more time to process, slowing down the agent loop.
This is why choosing a cost-effective model with good caching (like DeepSeek V3.2) is critical for agent workloads. See our cost comparison for details.
Key Insight
The best agents are not the ones with the most powerful LLM — they're the ones with the best observation compression. OpenClaw's DOM-to-text compressor reduces a 500KB webpage to ~2KB of relevant signal.