Agent Architecture Evaluator

Version: 1.0.0

Overview

This skill reviews the architecture of an agent system, not just its prompts or its attached skills.

Use it for architectures involving components such as:

planner / executor splits
routers and specialists
tool-use layers
memory systems
human approval gates
multi-agent coordination

Use this skill when

A user wants to assess an existing agent architecture.
Reliability, latency, cost, or coordination problems appear to be architectural.
A team needs a structured architecture review and optimization roadmap.
You need system-level test scenarios rather than single-skill evals.

Do not use this skill when

The problem is one isolated skill.
The task is to create a new skill from scratch.
The main need is portfolio review across many related skills.

Use agent-test-measure-refine or agent-skill-portfolio-evaluator in those cases.

Output contract

Always produce these named outputs:

architecture_inventory
failure_mode_map
architecture_test_plan
optimization_roadmap
measurement_plan
architecture_recommendation

Review dimensions

Evaluate at least these dimensions:

component clarity
routing correctness
memory usefulness
coordination reliability
cost and latency efficiency
observability and debuggability

Quick start

Map the current architecture.
Identify critical paths and failure-prone handoffs.
Define architecture-level test scenarios.
Identify bottlenecks in routing, memory, tools, or coordination.
Recommend the smallest structural changes with the highest leverage.

Workflow

1. Build the architecture inventory

Capture:

components
responsibilities
inputs and outputs
state or memory boundaries
human approval points
observability signals

2. Map failure modes

Look for:

planner produces unusable tasks
router sends work to the wrong specialist
memory pollutes current decisions
tool calls are slow, redundant, or poorly validated
multi-agent handoffs lose context
approval gates appear too late

3. Design system tests

Cover:

happy path
degraded upstream input
partial component failure
tool unavailability
stale or noisy memory
high-latency coordination
rollback or recovery behavior

See references/architecture-review-framework-v1.0.0.md.

4. Prioritize architectural changes

Prefer:

clarifying responsibilities before adding components
removing weak indirection
tightening interface contracts
adding observability before adding complexity
isolating state when cross-contamination is likely

5. Define measurement

Recommend concrete metrics where available:

task success rate
retry rate
fallback rate
cost per successful task
latency by stage
human intervention rate

agent-architecture-evaluator

Install via CLI (Recommended)