Official Verified

Ragaai Catalyst

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like a ragaai catalyst, python, agentic-ai.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bytesagain1/rag-evaluator

Download Source Code (.zip)

Rag Evaluator

AI-powered RAG (Retrieval-Augmented Generation) evaluation toolkit. Configure, benchmark, compare, and optimize your RAG pipelines from the command line. Track prompts, evaluations, fine-tuning experiments, costs, and usage — all with persistent local logging and full export capabilities.

Commands

Run rag-evaluator <command> [args] to use.

Command	Description
`configure`	Configure RAG evaluation settings and parameters
`benchmark`	Run benchmarks against your RAG pipeline
`compare`	Compare results across different RAG configurations
`prompt`	Log and manage prompt templates and variations
`evaluate`	Evaluate RAG output quality and relevance
`fine-tune`	Track fine-tuning experiments and parameters
`analyze`	Analyze evaluation results and identify patterns
`cost`	Track and log API/inference costs
`usage`	Monitor token usage and API call volumes
`optimize`	Log optimization strategies and results
`test`	Run test cases against RAG configurations
`report`	Generate evaluation reports
`stats`	Show summary statistics across all categories
`export <fmt>`	Export data in json, csv, or txt format
`search <term>`	Search across all logged entries
`recent`	Show recent activity from history log
`status`	Health check — version, data dir, disk usage
`help`	Show help and available commands
`version`	Show version (v2.0.0)

Each domain command (configure, benchmark, compare, etc.) works in two modes:

Without arguments: displays the most recent 20 entries from that category
With arguments: logs the input with a timestamp and saves to the category log file

Data Storage

All data is stored locally in ~/.local/share/rag-evaluator/:

Each command creates its own log file (e.g., configure.log, benchmark.log)
A unified history.log tracks all activity across commands
Entries are stored in timestamp|value pipe-delimited format
Export supports JSON, CSV, and plain text formats

Requirements

Bash 4+ with set -euo pipefail strict mode
Standard Unix utilities: date, wc, du, tail, grep, sed, cat
No external dependencies or API keys required

When to Use

Evaluating RAG pipeline quality — log evaluation scores, compare retrieval strategies, and track improvements over time
Benchmarking different configurations — run benchmarks across embedding models, chunk sizes, or retrieval methods and compare results side by side
Tracking costs and usage — monitor API costs and token usage across experiments to stay within budget
Managing prompt engineering — log prompt variations, test them against your pipeline, and analyze which templates perform best
Generating reports for stakeholders — export evaluation data as JSON/CSV for dashboards, or generate text reports summarizing RAG performance

Examples

Read Full Documentation on GitHub

Metadata

Author@bytesagain1

Stars4097

Updated2026-04-14

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bytesagain1-rag-evaluator": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

bonded

Bonded warehouse reference — customs procedures, duty deferral, FTZ operations, compliance requirements. Use when managing bonded storage, customs clearance, or duty suspension logistics.

bytesagain1 4126

cmms

Computerized maintenance management system

bytesagain1 4126

benchmark-tool

Benchmark CPU, memory, disk I/O, and network on your system. Use when measuring server performance.

bytesagain1 4126

console

Console & terminal output reference — logging levels, ANSI colors, debugging techniques, formatters. Use when styling terminal output, implementing log systems, or debugging with console tools.

bytesagain1 4126

System Prompts And Models Of Ai Tools

FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, L system prompts and models of ai tools, python, ai, bolt, cluely, copilot, cursor. Use when you need system prompts and models of ai tools capabilities. Triggers on: system prompts and models of ai tools.

bytesagain1 4126