evaluate-presets
Use when testing Ralph's hat collection presets, validating preset configurations, or auditing the preset library for bugs and UX issues.
Why use this skill?
Use the evaluate-presets skill to test, validate, and audit your OpenClaw hat collection configurations. Ensure quality with automated CLI-based testing tools.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/paulpete/evaluate-presetsWhat This Skill Does
The evaluate-presets skill serves as the primary testing and validation framework for the OpenClaw hat collection infrastructure. It allows developers and power users to programmatically verify that Ralph's hat configurations are functional, stable, and perform as expected under specific operational constraints. By utilizing shell-based scripts, this skill provides a direct, high-fidelity interface to trigger the evaluation pipeline, capturing complex metrics such as session duration, event publication frequency, and individual hat activation rates. It serves as a quality gate, ensuring that any modifications to routing logic or preset files are validated against the full suite of operational tasks before deployment.
Installation
To install this skill, use the ClawHub CLI tool in your terminal:
clawhub install openclaw/skills/skills/paulpete/evaluate-presets
Ensure your local environment has the necessary permissions to execute bash scripts within the OpenClaw directory structure, as the tool relies on accessing internal directories like .eval/ and the /tools repository.
Use Cases
- Continuous Integration/Deployment: Use this tool after updating hat configurations to ensure no regressions were introduced.
- Quality Assurance Audits: Run the full suite periodically to monitor the health of the preset library and identify performance degradation over time.
- Preset Debugging: Isolate specific problematic presets by running them individually to inspect log files, session JSONL records, and environmental runtime data.
- UX Validation: Analyze captured metrics to see if hat activation behavior matches user expectations in a controlled test environment.
Example Prompts
- "Run a full evaluation of all hat presets using the Kiro backend to verify the latest configuration audit."
- "Execute the
tdd-red-greenpreset evaluation using the Claude backend and keep it running in the background while I work on other tasks." - "Check the latest TaskOutput results for the recent preset suite evaluation to see if any tests failed or recorded anomalies."
Tips & Limitations
- Always use Background Mode: Because preset evaluations can run for extended periods (potentially several hours for a full suite), always set
run_in_background: truewhen invoking via the Bash tool to prevent blocking your agent session. - Resource Monitoring: Since these scripts perform intensive disk I/O and process spawning, monitor your system resources if running the full suite locally.
- Timeout Limits: The scripts are configured with a 10-minute timeout for individual sessions. If you are running an exceptionally large custom task, ensure your environment settings allow for this duration.
- Metrics Archiving: The skill creates a structured directory at
.eval/logs/. Be aware that running frequent tests will accumulate log data over time; perform periodic cleanup of older logs to save disk space.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-paulpete-evaluate-presets": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: file-write, file-read, code-execution
Related Skills
create-hat-collection
Generates new Ralph hat collection presets through guided conversation. Asks clarifying questions, validates against schema constraints, and outputs production-ready YAML files.
playwriter
Browser automation via Playwriter (remorses) using persistent Chrome sessions and the full Playwright Page API.
code-task-generator
Generates structured .code-task.md files from descriptions or PDD implementation plans. Auto-detects input type, creates properly formatted tasks with Given-When-Then acceptance criteria.
release-bump
Use when bumping ralph-orchestrator version for a new release, after fixes are committed and ready to publish
tmux-terminal
Interactive terminal control via tmux for TUI apps, prompts, and long-running CLI workflows.