OpenClawBrain v12.2.1

Learned retrieval graph for AI agents. Nodes are document chunks, edges are mutable weighted pointers. The graph learns from outcomes using policy-gradient updates (REINFORCE) and self-regulates via homeostatic decay, synaptic scaling, and tier hysteresis.

Install

pip install openclawbrain              # core (pure Python, zero deps)
pip install "openclawbrain[openai]"    # with OpenAI embeddings

Quick Start

# Build a brain from workspace files
openclawbrain init --workspace ./my-workspace --output ./brain --embedder openai

# Query
openclawbrain query "how do I deploy" --state ./brain/state.json --json

# Learn from outcome (+1 good, -1 bad)
openclawbrain learn --state ./brain/state.json --outcome 1.0 --fired-ids "node1,node2"

# Self-learn (agent-initiated, no human needed)
openclawbrain self-learn --state ./brain/state.json \
  --content "Always download artifacts before terminating instances" \
  --fired-ids "node1,node2" --outcome -1.0 --type CORRECTION

# Health check
openclawbrain doctor --state ./brain/state.json

Core Concepts

Learning Rule: Policy Gradient (default)

Default is apply_outcome_pg (REINFORCE). At each node, updates redistribute probability mass across ALL outgoing edges (sum ≈ 0). The chosen edge goes up, all alternatives go down. No inflation.

apply_outcome (heuristic) is available as fallback — only updates traversed edges, inflationary.

Self-Learning

Agents learn from their own observed outcomes without human feedback (self-correct available as CLI/API alias):

from openclawbrain.socket_client import OCBClient

with OCBClient('~/.openclawbrain/main/daemon.sock') as client:
    # Agent detected failure
    client.self_learn(
        content='Always download artifacts before terminating',
        fired_ids=['node1', 'node2'],
        outcome=-1.0,
        node_type='CORRECTION',   # penalize + inhibitory edges
    )

    # Agent observed success
    client.self_learn(
        content='Download-then-terminate works reliably',
        fired_ids=['node1', 'node2'],
        outcome=1.0,
        node_type='TEACHING',     # reinforce + positive knowledge
    )

Situation	outcome	type	Effect
Mistake	-1.0	CORRECTION	Penalize path + inhibitory edges
Fact learned	0.0	TEACHING	Inject knowledge only
Success	+1.0	TEACHING	Reinforce path + inject knowledge

Self-Regulation (automatic, no tuning needed)

Homeostatic decay: half-life auto-adjusts to maintain 5-15% reflex edge ratio. Bounded 60-300 cycles.
Synaptic scaling: soft per-node weight budget (5.0) prevents hub domination.
Tier hysteresis: habitual band 0.15-0.6 prevents threshold thrashing.
Synaptic scaling (maintenance detail): soft per-node weight budget (5.0) with fourth-root scaling.

openclawbrain

Install via CLI (Recommended)