skill-evaluator
Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010, OpenSSF, Shneiderman, agent-specific heuristics). Use when asked to review, audit, evaluate, score, or assess a skill before publishing, or when checking skill quality. Runs automated structural checks and guides manual assessment across 25 criteria.
Why use this skill?
Assess and improve your Clawdbot skills using our multi-framework evaluation tool. Get automated structural checks and detailed scoring for high-quality, reliable agent development.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/terwox/skill-evaluatorWhat This Skill Does
The skill-evaluator is the cornerstone of the OpenClaw quality assurance pipeline. It acts as a comprehensive auditor for any Clawdbot skill, ensuring that developers adhere to high standards of reliability, performance, and security before deployment. By integrating a multi-framework rubric—incorporating ISO 25010 for software quality, OpenSSF for security standards, and Shneiderman’s principles for human-computer interaction—this agent ensures that your skills are not just functional, but enterprise-grade. It performs automated structural analysis to check file integrity and dependency health, while facilitating a deep-dive manual assessment process to evaluate nuanced agent-specific behaviors like trigger precision and idempotency.
Installation
To add this auditor to your local OpenClaw development environment, run the following command in your terminal:
clawhub install openclaw/skills/skills/terwox/skill-evaluator
Ensure you have the required Python environment configured, as the automated structural checks rely on a script-based execution flow. Once installed, the tool adds the eval-skill.py utility to your PATH, allowing you to scan project directories instantly.
Use Cases
- Pre-Publishing Audit: Automatically scan and score a skill before submitting it to the public registry to identify potential P0/P1 blockers.
- Quality Benchmarking: Periodically run assessments on existing internal skills to ensure they continue to meet evolving safety and performance standards.
- Code Review Assistance: Use the automated output to provide developers with immediate feedback on structural errors, missing documentation, or credential leaks.
- Security Hardening: Identify potential vulnerabilities during the early stages of development, complementing advanced tools like SkillLens.
Example Prompts
- "Evaluate the quality of the 'data-summarizer' skill in my current directory and give me a summary of findings."
- "Perform an automated structural check on the '/projects/clawdbot/browser-navigator' skill and output the results in JSON format."
- "Help me conduct a manual assessment for the new 'calendar-scheduler' skill based on the 25 criteria rubric."
Tips & Limitations
- Automate First: Always execute the
--jsonflag to integrate results into CI/CD pipelines. - Manual Review is Key: The automated scripts catch syntax and structural issues, but only human judgment can truly evaluate 'Usability' and 'Agent-Specific' heuristics like user intent alignment.
- Complementary Scanning: This tool is a quality-assurance baseline; for high-stakes environments, always follow up with specialized security scanning tools to detect complex prompt injection or exfiltration risks.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-terwox-skill-evaluator": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-read, code-execution