ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified developer tools Safety 4/5

evaluate-presets

Use when testing Ralph's hat collection presets, validating preset configurations, or auditing the preset library for bugs and UX issues.

Why use this skill?

Use the evaluate-presets skill to test, validate, and audit your OpenClaw hat collection configurations. Ensure quality with automated CLI-based testing tools.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/paulpete/evaluate-presets
Or

What This Skill Does

The evaluate-presets skill serves as the primary testing and validation framework for the OpenClaw hat collection infrastructure. It allows developers and power users to programmatically verify that Ralph's hat configurations are functional, stable, and perform as expected under specific operational constraints. By utilizing shell-based scripts, this skill provides a direct, high-fidelity interface to trigger the evaluation pipeline, capturing complex metrics such as session duration, event publication frequency, and individual hat activation rates. It serves as a quality gate, ensuring that any modifications to routing logic or preset files are validated against the full suite of operational tasks before deployment.

Installation

To install this skill, use the ClawHub CLI tool in your terminal: clawhub install openclaw/skills/skills/paulpete/evaluate-presets Ensure your local environment has the necessary permissions to execute bash scripts within the OpenClaw directory structure, as the tool relies on accessing internal directories like .eval/ and the /tools repository.

Use Cases

  • Continuous Integration/Deployment: Use this tool after updating hat configurations to ensure no regressions were introduced.
  • Quality Assurance Audits: Run the full suite periodically to monitor the health of the preset library and identify performance degradation over time.
  • Preset Debugging: Isolate specific problematic presets by running them individually to inspect log files, session JSONL records, and environmental runtime data.
  • UX Validation: Analyze captured metrics to see if hat activation behavior matches user expectations in a controlled test environment.

Example Prompts

  1. "Run a full evaluation of all hat presets using the Kiro backend to verify the latest configuration audit."
  2. "Execute the tdd-red-green preset evaluation using the Claude backend and keep it running in the background while I work on other tasks."
  3. "Check the latest TaskOutput results for the recent preset suite evaluation to see if any tests failed or recorded anomalies."

Tips & Limitations

  • Always use Background Mode: Because preset evaluations can run for extended periods (potentially several hours for a full suite), always set run_in_background: true when invoking via the Bash tool to prevent blocking your agent session.
  • Resource Monitoring: Since these scripts perform intensive disk I/O and process spawning, monitor your system resources if running the full suite locally.
  • Timeout Limits: The scripts are configured with a 10-minute timeout for individual sessions. If you are running an exceptionally large custom task, ensure your environment settings allow for this duration.
  • Metrics Archiving: The skill creates a structured directory at .eval/logs/. Be aware that running frequent tests will accumulate log data over time; perform periodic cleanup of older logs to save disk space.

Metadata

Author@paulpete
Stars1217
Views0
Updated2026-02-20
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-paulpete-evaluate-presets": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#testing#automation#debugging#qa#devops
Safety Score: 4/5

Flags: file-write, file-read, code-execution