Official Verified

Rag Pipeline Starter

Skill by abhinas90

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/abhinas90/rag-pipeline-starter

Download Source Code (.zip)

RAG Pipeline Starter

Production-grade RAG pipeline setup with chunking strategies, embedding benchmarks, and retrieval tuning for 50K-500K row datasets.

Overview

This skill provides a complete toolkit for building and optimizing RAG (Retrieval-Augmented Generation) pipelines. It analyzes your data, recommends optimal chunking strategies, benchmarks embedding models, and helps tune retrieval parameters for maximum accuracy.

When to Use

Building a new RAG system from scratch
Optimizing an existing RAG pipeline's retrieval quality
Choosing the right embedding model for your domain
Processing large document collections (50K-500K rows)
Need to balance speed vs. accuracy for your use case

Scripts

chunking_analyzer.py

Analyzes documents and recommends optimal chunking strategies based on content structure.

Usage:

# Assess data and get strategy recommendation
python chunking_analyzer.py --assess ./data

# Apply chunking strategy to documents
python chunking_analyzer.py --strategy recursive --input ./data/doc.txt --output ./chunks/ --chunk-size 500 --overlap 50

Options:

--assess <dir> - Analyze documents and recommend strategy
--strategy <name> - Chunking strategy: fixed, semantic, recursive, hierarchical
--input <path> - Input file or directory
--output <dir> - Output directory for chunks
--chunk-size <int> - Chunk size (default: 500)
--overlap <int> - Overlap between chunks (default: 50)

embedding_benchmark.py

Tests multiple embedding models on your data to find the best fit for your domain.

Usage:

python embedding_benchmark.py --data ./chunks/ --domain finance --output results.json

Options:

--embeddings <models> - Embedding models to test (space-separated)
--data <dir> - Directory with chunked text files (required)
--domain <name> - Domain name for context-specific recommendations
--output <file> - Output file for results (JSON)

Supported Embeddings:

sentence-transformers/all-MiniLM-L6-v2 (384 dims, fast, free)
sentence-transformers/all-mpnet-base-v2 (768 dims, medium, free)
openai/text-embedding-ada-002 (1536 dims, fast, paid)
cohere/embed-english-v3.0 (1024 dims, fast, paid)
bm25 (sparse, fast, free)

retrieval_tuner.py

Optimizes retrieval parameters (top-k, similarity threshold) for your specific use case.

Usage:

python retrieval_tuner.py --index ./vector_store/ --queries ./test_queries.json --output tuning_results.json

Options:

--index <dir> - Vector store index directory
--queries <file> - JSON file with test queries and expected results
--output <file> - Output file for tuning results
--top-k-range <min> <max> - Range of top-k values to test (default: 1 20)
--threshold-range <min> <max> <step> - Similarity threshold range

vector_store_manager.py

Manages vector store operations: create, update, search, and maintain indexes.

Read Full Documentation on GitHub

Metadata

Author@abhinas90

Stars4473

Updated2026-05-01

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-abhinas90-rag-pipeline-starter": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

Multi-Agent Deployment Skill for OpenClaw

Deploy a production-ready multi-agent fleet in OpenClaw. Includes step-by-step setup guide, workspace templates, and Python automation scripts for agent creation, routing config, memory sync, and cloud deployment — based on a real working 4-agent production setup.

abhinas90 4473

Claude Code Mastery

Complete guide to mastering Claude Code CLI — installation to production workflows

abhinas90 4473

Claude Code Memory Kit

Stop Claude Code from repeating mistakes — enforce guardrails, preserve context, maintain consistency across sessions

abhinas90 4473

Claude Code Mastery

Complete guide to mastering Claude Code CLI — installation to production workflows

abhinas90 4473