ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

Ragaai Catalyst

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like a ragaai catalyst, python, agentic-ai.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/bytesagain1/rag-evaluator
Or

Rag Evaluator

AI-powered RAG (Retrieval-Augmented Generation) evaluation toolkit. Configure, benchmark, compare, and optimize your RAG pipelines from the command line. Track prompts, evaluations, fine-tuning experiments, costs, and usage — all with persistent local logging and full export capabilities.

Commands

Run rag-evaluator <command> [args] to use.

CommandDescription
configureConfigure RAG evaluation settings and parameters
benchmarkRun benchmarks against your RAG pipeline
compareCompare results across different RAG configurations
promptLog and manage prompt templates and variations
evaluateEvaluate RAG output quality and relevance
fine-tuneTrack fine-tuning experiments and parameters
analyzeAnalyze evaluation results and identify patterns
costTrack and log API/inference costs
usageMonitor token usage and API call volumes
optimizeLog optimization strategies and results
testRun test cases against RAG configurations
reportGenerate evaluation reports
statsShow summary statistics across all categories
export <fmt>Export data in json, csv, or txt format
search <term>Search across all logged entries
recentShow recent activity from history log
statusHealth check — version, data dir, disk usage
helpShow help and available commands
versionShow version (v2.0.0)

Each domain command (configure, benchmark, compare, etc.) works in two modes:

  • Without arguments: displays the most recent 20 entries from that category
  • With arguments: logs the input with a timestamp and saves to the category log file

Data Storage

All data is stored locally in ~/.local/share/rag-evaluator/:

  • Each command creates its own log file (e.g., configure.log, benchmark.log)
  • A unified history.log tracks all activity across commands
  • Entries are stored in timestamp|value pipe-delimited format
  • Export supports JSON, CSV, and plain text formats

Requirements

  • Bash 4+ with set -euo pipefail strict mode
  • Standard Unix utilities: date, wc, du, tail, grep, sed, cat
  • No external dependencies or API keys required

When to Use

  1. Evaluating RAG pipeline quality — log evaluation scores, compare retrieval strategies, and track improvements over time
  2. Benchmarking different configurations — run benchmarks across embedding models, chunk sizes, or retrieval methods and compare results side by side
  3. Tracking costs and usage — monitor API costs and token usage across experiments to stay within budget
  4. Managing prompt engineering — log prompt variations, test them against your pipeline, and analyze which templates perform best
  5. Generating reports for stakeholders — export evaluation data as JSON/CSV for dashboards, or generate text reports summarizing RAG performance

Examples

Metadata

Stars4097
Views1
Updated2026-04-14
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-bytesagain1-rag-evaluator": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.