ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

evaluation-framework

Patterns for building evaluation and scoring systems, quality gates, rubrics, and decision frameworks. Use for any scored assessment

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/athola/nm-leyline-evaluation-framework
Or

Night Market Skill — ported from claude-night-market/leyline. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Table of Contents

  • Overview
  • When to Use
  • Core Pattern
  • 1. Define Criteria
  • 2. Score Each Criterion
  • 3. Calculate Weighted Total
  • 4. Apply Decision Thresholds
  • Quick Start
  • Define Your Evaluation
  • Example: Code Review Evaluation
  • Evaluation Workflow
  • Common Use Cases
  • Integration Pattern
  • Detailed Resources
  • Exit Criteria

Evaluation Framework

Overview

A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology.

This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions.

When To Use

  • Implementing quality gates or evaluation rubrics
  • Building scoring systems for artifacts, proposals, or submissions
  • Need consistent evaluation methodology across different domains
  • Want threshold-based automated decision making
  • Creating assessment tools with weighted criteria

When NOT To Use

  • Simple pass/fail without scoring needs

Core Pattern

1. Define Criteria

criteria:
  - name: criterion_name
    weight: 0.30          # 30% of total score
    description: What this measures
    scoring_guide:
      90-100: Exceptional
      70-89: Strong
      50-69: Acceptable
      30-49: Weak
      0-29: Poor

Verification: Run the command with --help flag to verify availability.

2. Score Each Criterion

scores = {
    "criterion_1": 85,  # Out of 100
    "criterion_2": 92,
    "criterion_3": 78,
}

Verification: Run the command with --help flag to verify availability.

3. Calculate Weighted Total

total = sum(score * weights[criterion] for criterion, score in scores.items())
# Example: (85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

Verification: Run the command with --help flag to verify availability.

4. Apply Decision Thresholds

thresholds:
  80-100: Accept with priority
  60-79: Accept with conditions
  40-59: Review required
  20-39: Reject with feedback
  0-19: Reject

Verification: Run the command with --help flag to verify availability.

Quick Start

Define Your Evaluation

Metadata

Author@athola
Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-athola-nm-leyline-evaluation-framework": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.