ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 5/5

skill-evaluator

Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, evaluate, score, or assess a skill before publishing, or when checking skill quality. Runs automated structural checks and guides manual assessment across 25 criteria.

Why use this skill?

Assess and improve your Clawdbot skills using our multi-framework evaluation tool. Get automated structural checks and detailed scoring for high-quality, reliable agent development.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/terwox/skill-evaluator
Or

What This Skill Does

The skill-evaluator is the cornerstone of the OpenClaw quality assurance pipeline. It acts as a comprehensive auditor for any Clawdbot skill, ensuring that developers adhere to high standards of reliability, performance, and security before deployment. By integrating a multi-framework rubric—incorporating ISO 25010 for software quality, OpenSSF for security standards, and Shneiderman’s principles for human-computer interaction—this agent ensures that your skills are not just functional, but enterprise-grade. It performs automated structural analysis to check file integrity and dependency health, while facilitating a deep-dive manual assessment process to evaluate nuanced agent-specific behaviors like trigger precision and idempotency.

Installation

To add this auditor to your local OpenClaw development environment, run the following command in your terminal:

clawhub install openclaw/skills/skills/terwox/skill-evaluator

Ensure you have the required Python environment configured, as the automated structural checks rely on a script-based execution flow. Once installed, the tool adds the eval-skill.py utility to your PATH, allowing you to scan project directories instantly.

Use Cases

  • Pre-Publishing Audit: Automatically scan and score a skill before submitting it to the public registry to identify potential P0/P1 blockers.
  • Quality Benchmarking: Periodically run assessments on existing internal skills to ensure they continue to meet evolving safety and performance standards.
  • Code Review Assistance: Use the automated output to provide developers with immediate feedback on structural errors, missing documentation, or credential leaks.
  • Security Hardening: Identify potential vulnerabilities during the early stages of development, complementing advanced tools like SkillLens.

Example Prompts

  1. "Evaluate the quality of the 'data-summarizer' skill in my current directory and give me a summary of findings."
  2. "Perform an automated structural check on the '/projects/clawdbot/browser-navigator' skill and output the results in JSON format."
  3. "Help me conduct a manual assessment for the new 'calendar-scheduler' skill based on the 25 criteria rubric."

Tips & Limitations

  • Automate First: Always execute the --json flag to integrate results into CI/CD pipelines.
  • Manual Review is Key: The automated scripts catch syntax and structural issues, but only human judgment can truly evaluate 'Usability' and 'Agent-Specific' heuristics like user intent alignment.
  • Complementary Scanning: This tool is a quality-assurance baseline; for high-stakes environments, always follow up with specialized security scanning tools to detect complex prompt injection or exfiltration risks.

Metadata

Author@terwox
Stars946
Views0
Updated2026-02-13
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-terwox-skill-evaluator": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#quality-assurance#audit#developer-tools#compliance
Safety Score: 5/5

Flags: file-read, code-execution