Ragaai Catalyst
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like a ragaai catalyst, python, agentic-ai.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/bytesagain1/rag-evaluatorRag Evaluator
AI-powered RAG (Retrieval-Augmented Generation) evaluation toolkit. Configure, benchmark, compare, and optimize your RAG pipelines from the command line. Track prompts, evaluations, fine-tuning experiments, costs, and usage — all with persistent local logging and full export capabilities.
Commands
Run rag-evaluator <command> [args] to use.
| Command | Description |
|---|---|
configure | Configure RAG evaluation settings and parameters |
benchmark | Run benchmarks against your RAG pipeline |
compare | Compare results across different RAG configurations |
prompt | Log and manage prompt templates and variations |
evaluate | Evaluate RAG output quality and relevance |
fine-tune | Track fine-tuning experiments and parameters |
analyze | Analyze evaluation results and identify patterns |
cost | Track and log API/inference costs |
usage | Monitor token usage and API call volumes |
optimize | Log optimization strategies and results |
test | Run test cases against RAG configurations |
report | Generate evaluation reports |
stats | Show summary statistics across all categories |
export <fmt> | Export data in json, csv, or txt format |
search <term> | Search across all logged entries |
recent | Show recent activity from history log |
status | Health check — version, data dir, disk usage |
help | Show help and available commands |
version | Show version (v2.0.0) |
Each domain command (configure, benchmark, compare, etc.) works in two modes:
- Without arguments: displays the most recent 20 entries from that category
- With arguments: logs the input with a timestamp and saves to the category log file
Data Storage
All data is stored locally in ~/.local/share/rag-evaluator/:
- Each command creates its own log file (e.g.,
configure.log,benchmark.log) - A unified
history.logtracks all activity across commands - Entries are stored in
timestamp|valuepipe-delimited format - Export supports JSON, CSV, and plain text formats
Requirements
- Bash 4+ with
set -euo pipefailstrict mode - Standard Unix utilities:
date,wc,du,tail,grep,sed,cat - No external dependencies or API keys required
When to Use
- Evaluating RAG pipeline quality — log evaluation scores, compare retrieval strategies, and track improvements over time
- Benchmarking different configurations — run benchmarks across embedding models, chunk sizes, or retrieval methods and compare results side by side
- Tracking costs and usage — monitor API costs and token usage across experiments to stay within budget
- Managing prompt engineering — log prompt variations, test them against your pipeline, and analyze which templates perform best
- Generating reports for stakeholders — export evaluation data as JSON/CSV for dashboards, or generate text reports summarizing RAG performance
Examples
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-bytesagain1-rag-evaluator": {
"enabled": true,
"auto_update": true
}
}
}Related Skills
bonded
Bonded warehouse reference — customs procedures, duty deferral, FTZ operations, compliance requirements. Use when managing bonded storage, customs clearance, or duty suspension logistics.
cmms
Computerized maintenance management system
benchmark-tool
Benchmark CPU, memory, disk I/O, and network on your system. Use when measuring server performance.
console
Console & terminal output reference — logging levels, ANSI colors, debugging techniques, formatters. Use when styling terminal output, implementing log systems, or debugging with console tools.
System Prompts And Models Of Ai Tools
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, L system prompts and models of ai tools, python, ai, bolt, cluely, copilot, cursor. Use when you need system prompts and models of ai tools capabilities. Triggers on: system prompts and models of ai tools.