Official Verified

ml-model-eval-benchmark

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/0x-professor/ml-model-eval-benchmark

Download Source Code (.zip)

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

Define metric weights and accepted metric ranges.
Ingest model metrics for each candidate.
Compute weighted score and ranking.
Export leaderboard and promotion recommendation.

Use Bundled Resources

Run scripts/benchmark_models.py to generate benchmark outputs.
Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

Keep metric names and scales consistent across candidates.
Record weighting assumptions in the output.

Read Full Documentation on GitHub

Metadata

Author@0x-professor

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-0x-professor-ml-model-eval-benchmark": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

cyber-kev-triage

Prioritize vulnerability remediation using KEV-style exploitation context plus asset criticality. Use for CVE triage, patch order decisions, and remediation reporting.

0x-professor 4473

agentic-mcp-server-builder

Scaffold MCP server projects and baseline tool contract checks. Use for defining tool schemas, generating starter server layouts, and validating MCP-ready structure.

0x-professor 4473

cyber-ir-playbook

Build incident response timelines and report packs from event logs. Use for detection-to-recovery reporting, phase tracking, and stakeholder-ready incident summaries.

0x-professor 4473

cyber-owasp-review

Map application security findings to OWASP Top 10 categories and generate remediation checklists. Use for normalized AppSec review outputs and category-level prioritization.

0x-professor 4473

agentic-workflow-automation

Generate reusable multi-step agent workflow blueprints. Use for trigger/action orchestration, deterministic workflow definitions, and automation handoff artifacts.

0x-professor 4473