ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 4/5

expertpack-eval

Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs cannot produce on their own, (2) Running automated eval sets against a pack-powered agent with LLM-as-judge scoring. Requires OpenRouter API key (auto-resolved from OpenClaw auth or OPENROUTER_API_KEY env var). Companion to the main expertpack skill. Triggers on: 'EK ratio', 'measure EK', 'blind probe', 'eval expertpack', 'pack quality eval', 'run eval', 'esoteric knowledge ratio'. Note: packs are Obsidian-compatible — eval results (ek_score) can be added to file frontmatter and queried in Obsidian via Dataview.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/brianhearn/expertpack-eval
Or

What This Skill Does

The expertpack-eval skill serves as the analytical backbone for the ExpertPack ecosystem. It allows developers and AI engineers to quantitatively measure the unique value proposition of their custom AI knowledge packs by calculating an Esoteric Knowledge (EK) ratio. This ratio indicates how much of your knowledge pack consists of information that frontier LLMs (such as GPT-4, Claude 3.5, or Gemini) do not possess natively in their training weights. By leveraging LLM-as-a-judge patterns, the skill performs automated 'blind probes'—testing a model's ability to answer questions without your pack, then comparing those answers to ground truth. Furthermore, the skill facilitates rigorous quality control through automated endpoint evaluation, ensuring that your agent behaves reliably across varying levels of question complexity, from basic queries to advanced, out-of-scope scenarios.

Installation

To integrate this evaluation suite into your environment, use the OpenClaw command-line interface. Ensure you have your environment configured with a valid OpenRouter API key, as the skill requires external model access to perform its evaluation metrics. Install the skill using the following command:

clawhub install openclaw/skills/skills/brianhearn/expertpack-eval

Once installed, ensure your environment variables are correctly exported, or verify your OpenClaw authentication profile to allow the script to handle API requests automatically.

Use Cases

  • Quantifying Pack Value: Use the EK ratio calculation during development to prove the efficacy of niche or proprietary data sets.
  • Quality Assurance: Run baseline eval sets against your agent whenever you modify your pack's retrieval logic, instructions, or model configuration to prevent regressions.
  • Optimizing Agent Training: Use the 'out-of-scope' question results to determine if your agent is over-fitting or hallucinating when it should correctly deny knowledge.
  • Deployment Readiness: Provide the generated YAML report as proof of quality before deploying a pack-powered agent to a production environment.

Example Prompts

  1. "Measure the EK ratio for the current expertpack in my local directory to see how much unique knowledge it provides."
  2. "Run the blind probe eval on the new policy documentation pack using GPT-4-mini and Claude Sonnet as comparison models."
  3. "Initiate an automated quality eval against the active endpoint using the questions.yaml test set."

Tips & Limitations

  • Budget Awareness: Since this skill makes numerous calls to multiple frontier models for evaluation, keep an eye on your OpenRouter usage costs, especially when running large-scale eval sets with a high number of 'sample' iterations.
  • Judge Reliability: Always prioritize Claude Sonnet for the 'judge' role, as smaller models often struggle with consistent scoring logic, which can skew your EK ratio accuracy.
  • Data Privacy: Ensure that your eval test sets do not contain sensitive or PII data, as these queries are sent to external API endpoints for analysis during the evaluation process.

Metadata

Stars4190
Views1
Updated2026-04-18
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-brianhearn-expertpack-eval": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#evaluation#expertpack#llm-metrics#quality-assurance#data-analysis
Safety Score: 4/5

Flags: network-access, file-read, file-write, external-api, code-execution

Related Skills

expertpack

Work with ExpertPacks — structured knowledge packs for AI agents. Obsidian-compatible: every pack is a valid Obsidian vault with Dataview support. Use when: (1) Loading/consuming an ExpertPack as agent context, (2) Creating or hydrating a new ExpertPack from scratch, (3) Configuring RAG for a pack, (4) Opening or authoring a pack in Obsidian. Triggers on: 'expertpack', 'expert pack', 'esoteric knowledge', 'knowledge pack', 'pack hydration', 'obsidian vault', 'obsidian pack'. For CLI tools (ep-validate, ep-doctor, ep-graph-export, ep-strip-frontmatter) install expertpack-cli. For EK ratio measurement and quality evals install expertpack-eval. For exporting an OpenClaw agent as an ExpertPack install expertpack-export. For converting an existing Obsidian Vault into an ExpertPack install obsidian-to-expertpack. For serving any ExpertPack as an MCP endpoint (expertise-as-a-service), see EP MCP at github.com/brianhearn/ep-mcp.

brianhearn 4190

elite-to-expertpack

Convert Elite Longterm Memory data into a structured ExpertPack. Migrates the 5-layer memory system (SESSION-STATE hot RAM, LanceDB warm store, Git-Notes cold store, MEMORY.md curated archive, and daily journals) into ExpertPack's portable format with multi-layer retrieval, context tiers, and EK measurement. Output is Obsidian-compatible — includes YAML frontmatter on all content files and can be opened as an Obsidian vault. Use when: upgrading from Elite Longterm Memory to ExpertPack, backing up agent knowledge, or migrating to a new platform. Triggers on: 'elite to expertpack', 'convert elite memory', 'export elite memory', 'migrate elite longterm', 'upgrade memory to expertpack', 'elite memory export'.

brianhearn 4190

expertpack-cli

Run ExpertPack CLI tools for validating, fixing, graphing, and deploying packs. Use when: running ep-validate, ep-doctor, ep-graph-export, ep-strip-frontmatter, or ep-fix-broken-wikilinks on a local pack. Triggers on: 'validate pack', 'ep-validate', 'ep-doctor', 'fix pack errors', 'graph export', 'ep-graph-export', 'strip frontmatter', 'deploy pack', 'ep-strip-frontmatter'. Requires the ExpertPack repo cloned locally (github.com/brianhearn/ExpertPack) — tools live in tools/validator/.

brianhearn 4190

self-improving-to-expertpack

Convert Self-Improving Agent learnings into a structured ExpertPack. Migrates the .learnings/ directory (LEARNINGS.md, ERRORS.md, FEATURE_REQUESTS.md) and any promoted content from workspace files into ExpertPack's portable format with multi-layer retrieval, context tiers, and EK measurement. Output is Obsidian-compatible — includes YAML frontmatter on all content files and can be opened as an Obsidian vault. Use when: upgrading from Self-Improving Agent to ExpertPack, backing up agent learnings, exporting accumulated knowledge, or migrating to a new platform. Triggers on: 'self-improving to expertpack', 'convert self-improving', 'export learnings', 'migrate self-improving', 'learnings to expertpack', 'convert learnings to pack'.

brianhearn 4190

obsidian-to-expertpack

Convert an existing Obsidian Vault into an agent-ready ExpertPack. Restructures vault content for EK optimization, RAG retrieval, and OpenClaw integration. Creates a copy — source vault is never modified. Use when: a user wants to make their Obsidian Vault usable by AI agents, convert OV to EP, drop their vault into OpenClaw as a knowledge pack, or make their notes RAG-ready. Triggers on: 'obsidian to expertpack', 'obsidian vault to ep', 'convert obsidian', 'OV to EP', 'obsidian agent ready', 'make my vault ai ready', 'obsidian knowledge pack', 'obsidian rag'.

brianhearn 4190