Official Verified developer tools Safety 4/5

evaluate-presets

Use when testing Ralph's hat collection presets, validating preset configurations, or auditing the preset library for bugs and UX issues.

Why use this skill?

Use the evaluate-presets skill to test, validate, and audit your OpenClaw hat collection configurations. Ensure quality with automated CLI-based testing tools.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/paulpete/evaluate-presets

Download Source Code (.zip)

What This Skill Does

The evaluate-presets skill serves as the primary testing and validation framework for the OpenClaw hat collection infrastructure. It allows developers and power users to programmatically verify that Ralph's hat configurations are functional, stable, and perform as expected under specific operational constraints. By utilizing shell-based scripts, this skill provides a direct, high-fidelity interface to trigger the evaluation pipeline, capturing complex metrics such as session duration, event publication frequency, and individual hat activation rates. It serves as a quality gate, ensuring that any modifications to routing logic or preset files are validated against the full suite of operational tasks before deployment.

Installation

To install this skill, use the ClawHub CLI tool in your terminal: clawhub install openclaw/skills/skills/paulpete/evaluate-presets Ensure your local environment has the necessary permissions to execute bash scripts within the OpenClaw directory structure, as the tool relies on accessing internal directories like .eval/ and the /tools repository.

Use Cases

Continuous Integration/Deployment: Use this tool after updating hat configurations to ensure no regressions were introduced.
Quality Assurance Audits: Run the full suite periodically to monitor the health of the preset library and identify performance degradation over time.
Preset Debugging: Isolate specific problematic presets by running them individually to inspect log files, session JSONL records, and environmental runtime data.
UX Validation: Analyze captured metrics to see if hat activation behavior matches user expectations in a controlled test environment.

Example Prompts

"Run a full evaluation of all hat presets using the Kiro backend to verify the latest configuration audit."
"Execute the tdd-red-green preset evaluation using the Claude backend and keep it running in the background while I work on other tasks."
"Check the latest TaskOutput results for the recent preset suite evaluation to see if any tests failed or recorded anomalies."

Tips & Limitations

Always use Background Mode: Because preset evaluations can run for extended periods (potentially several hours for a full suite), always set run_in_background: true when invoking via the Bash tool to prevent blocking your agent session.
Resource Monitoring: Since these scripts perform intensive disk I/O and process spawning, monitor your system resources if running the full suite locally.
Timeout Limits: The scripts are configured with a 10-minute timeout for individual sessions. If you are running an exceptionally large custom task, ensure your environment settings allow for this duration.
Metrics Archiving: The skill creates a structured directory at .eval/logs/. Be aware that running frequent tests will accumulate log data over time; perform periodic cleanup of older logs to save disk space.

Read Full Documentation on GitHub

Metadata

Author@paulpete

Stars1217

Updated2026-02-20

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-paulpete-evaluate-presets": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Tags(AI)

#testing#automation#debugging#qa#devops

Safety Score: 4/5

Flags: file-write, file-read, code-execution

Related Skills

create-hat-collection

Generates new Ralph hat collection presets through guided conversation. Asks clarifying questions, validates against schema constraints, and outputs production-ready YAML files.

paulpete 1217

playwriter

Browser automation via Playwriter (remorses) using persistent Chrome sessions and the full Playwright Page API.

paulpete 1217

code-task-generator

Generates structured .code-task.md files from descriptions or PDD implementation plans. Auto-detects input type, creates properly formatted tasks with Given-When-Then acceptance criteria.

paulpete 1217

release-bump

Use when bumping ralph-orchestrator version for a new release, after fixes are committed and ready to publish

paulpete 1217

tmux-terminal

Interactive terminal control via tmux for TUI apps, prompts, and long-running CLI workflows.

paulpete 1217