ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

ml-model-eval-benchmark

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/0x-professor/ml-model-eval-benchmark
Or

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

  1. Define metric weights and accepted metric ranges.
  2. Ingest model metrics for each candidate.
  3. Compute weighted score and ranking.
  4. Export leaderboard and promotion recommendation.

Use Bundled Resources

  • Run scripts/benchmark_models.py to generate benchmark outputs.
  • Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

  • Keep metric names and scales consistent across candidates.
  • Record weighting assumptions in the output.

Metadata

Stars4473
Views0
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-0x-professor-ml-model-eval-benchmark": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.