Official Verified

evaluation-framework

Patterns for building evaluation and scoring systems, quality gates, rubrics, and decision frameworks. Use for any scored assessment

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/athola/nm-leyline-evaluation-framework

Download Source Code (.zip)

Night Market Skill — ported from claude-night-market/leyline. For the full experience with agents, hooks, and commands, install the Claude Code plugin.

Overview
When to Use
Core Pattern
1. Define Criteria
2. Score Each Criterion
3. Calculate Weighted Total
4. Apply Decision Thresholds
Quick Start
Define Your Evaluation
Example: Code Review Evaluation
Evaluation Workflow
Common Use Cases
Integration Pattern
Detailed Resources
Exit Criteria

Evaluation Framework

Overview

A generic framework for weighted scoring and threshold-based decision making. Provides reusable patterns for evaluating any artifact against configurable criteria with consistent scoring methodology.

This framework abstracts the common pattern of: define criteria → assign weights → score against criteria → apply thresholds → make decisions.

When To Use

Implementing quality gates or evaluation rubrics
Building scoring systems for artifacts, proposals, or submissions
Need consistent evaluation methodology across different domains
Want threshold-based automated decision making
Creating assessment tools with weighted criteria

When NOT To Use

Simple pass/fail without scoring needs

Core Pattern

1. Define Criteria

criteria:
  - name: criterion_name
    weight: 0.30          # 30% of total score
    description: What this measures
    scoring_guide:
      90-100: Exceptional
      70-89: Strong
      50-69: Acceptable
      30-49: Weak
      0-29: Poor

Verification: Run the command with --help flag to verify availability.

2. Score Each Criterion

scores = {
    "criterion_1": 85,  # Out of 100
    "criterion_2": 92,
    "criterion_3": 78,
}

Verification: Run the command with --help flag to verify availability.

3. Calculate Weighted Total

total = sum(score * weights[criterion] for criterion, score in scores.items())
# Example: (85 × 0.30) + (92 × 0.40) + (78 × 0.30) = 85.5

Verification: Run the command with --help flag to verify availability.

4. Apply Decision Thresholds

thresholds:
  80-100: Accept with priority
  60-79: Accept with conditions
  40-59: Review required
  20-39: Reject with feedback
  0-19: Reject

Verification: Run the command with --help flag to verify availability.

Quick Start

Define Your Evaluation

Read Full Documentation on GitHub

Metadata

Author@athola

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-athola-nm-leyline-evaluation-framework": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

extract

Analyze a codebase and build a knowledge base of business logic, architecture, data flow, and engineering patterns. The foundation for gauntlet challenges and agent integration

athola 4473

discourse

>- Scan community discussion channels (HN, Lobsters, Reddit, tech blogs) for experience reports and opinions on a topic

athola 4473

synthesize

>- Merge, deduplicate, rank, and format research findings from multiple channels into a coherent report. Use after research agents return their results

athola 4473

workflow-monitor

Detect workflow failures and inefficient patterns, then create GitHub issues for improvement via /fix-workflow

athola 4473

architecture-paradigm-hexagonal

Hexagonal (Ports and Adapters) architecture isolating domain logic from infrastructure

athola 4473

evaluation-framework

Install via CLI (Recommended)

Table of Contents

Evaluation Framework

Overview

When To Use

When NOT To Use

Core Pattern

1. Define Criteria

2. Score Each Criterion

3. Calculate Weighted Total

4. Apply Decision Thresholds

Quick Start

Define Your Evaluation

Metadata

Related Skills

extract

discourse

synthesize

workflow-monitor

architecture-paradigm-hexagonal