ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

Agent Self-Assessment

Security self-assessment tool for AI agents. Run this against your own configuration to get a structured threat model report with RED/AMBER/GREEN ratings across six security domains — decision boundaries, audit trail, credential scoping, plane separation, economic accountability, and memory safety.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/roosch269/agent-self-assessment
Or

Agent Self-Assessment

Run a structured security self-assessment against your own configuration. You will produce a threat model report with findings and remediation steps.


How to Run

When invoked, you (the agent) perform the following six checks against your actual current configuration — not hypothetically. Use exec, file reads, environment inspection, and tool introspection where needed. Then output the report.

Do not skip checks. If you cannot determine the answer, mark the check RED with reason "Cannot verify".


Check 1: Decision Boundaries

Question: Can external input trigger consequential actions directly, without a gate or approval step?

What to inspect:

  • Review your active skills and tools. Which ones perform write, send, delete, pay, or deploy operations?
  • Is there a human-in-the-loop gate (confirmation prompt, approval workflow, or ask-mode) before any of these fire?
  • Can an incoming message (Discord, webhook, email, API call) cause a consequential action without a gate?
  • Is there an explicit list of "safe" vs "gated" operations documented somewhere?

Checks to run:

1. List all tools/skills with write/send/delete/pay/deploy capability
2. For each: is ask=always, ask=on-miss, or no-ask configured?
3. Is there any path from untrusted ingress → consequential action with zero gates?
4. Are decision boundaries documented (e.g., in AGENTS.md or a policy file)?

Scoring:

  • 🟢 GREEN — All consequential actions require explicit gate; boundaries documented
  • 🟡 AMBER — Gates exist but not all paths covered, or documentation missing
  • 🔴 RED — Direct ingress → action path exists with no gate; or cannot verify

Check 2: Audit Trail

Question: Is there an append-only, hash-chained, tamper-evident log of consequential actions?

What to inspect:

  • Does an audit log file exist? (Check audit/ directory or equivalent)
  • Is it append-only NDJSON (one JSON object per line)?
  • Does each entry include: ts, kind, actor, target, summary, provenance?
  • Is there hash chaining? (chain.prev, chain.hash fields on each entry)
  • Is chain.algo documented (e.g., sha256(prev\nline_c14n))?
  • When was the last entry written? Is logging actually happening?

Checks to run:

# Check if audit log exists
ls -la audit/ 2>/dev/null || echo "No audit directory"

# Check last 3 entries
tail -3 audit/atlas-actions.ndjson 2>/dev/null | python3 -m json.tool 2>/dev/null

# Verify hash chaining present
grep -c '"chain"' audit/atlas-actions.ndjson 2>/dev/null || echo "No chain field found"

# Check entry count
wc -l audit/atlas-actions.ndjson 2>/dev/null

Scoring:

  • 🟢 GREEN — Log exists, append-only NDJSON, hash chaining present, recently written
  • 🟡 AMBER — Log exists but missing hash chaining, or sparse/incomplete entries
  • 🔴 RED — No audit log; or log exists but is mutable/cleartext with no integrity check

Metadata

Author@roosch269
Stars1133
Views0
Updated2026-02-18
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-roosch269-agent-self-assessment": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags

#security#self-assessment#threat-model#agent-safety#audit
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.