Homelab Cluster Management

Manage a compound AI compute cluster spanning multiple tiers of GPU and CPU inference nodes. Built and battle-tested by Lumina Homelab.

When to Use

Use this skill when your agent needs to:

Monitor health of distributed model endpoints
Route inference requests to the best available model
Recover downed nodes automatically
Plan GPU memory allocation across models
Deploy models across heterogeneous hardware

Architecture Pattern

A homelab cluster typically spans 2-3 tiers:

Tier	Typical Hardware	Runtime	Role
Local	Primary GPU (RTX 4090/5090)	Ollama	Fast inference, embeddings
Remote	Secondary GPU (RTX 3090/4090)	llama.cpp or Ollama	Distributed inference
NAS/CPU	Synology, RPi, any CPU node	Ollama	Lightweight models, fallback

A LiteLLM proxy sits in front, providing a unified OpenAI-compatible API across all tiers.

Health Monitoring

Check all endpoints with configurable per-endpoint timeouts:

# Define endpoints with tier labels
ENDPOINTS = {
    "local/ollama": {"url": "http://localhost:11434/api/tags", "tier": "LOCAL"},
    "remote/mark-i": {"url": "http://REMOTE_IP:3009/v1/models", "tier": "REMOTE", "timeout": 8},
    "gateway/litellm": {"url": "http://localhost:8080/health/liveliness", "tier": "GATEWAY"},
}

# For each endpoint: GET with timeout, check HTTP 200
# Classify: HEALTHY / DEGRADED / DOWN per tier
# Overall prognosis based on tier health

Key lesson: Use /health/liveliness for LiteLLM, not /health — the latter probes all model routes and hangs if any are unreachable.

Expert MoE Routing

Route requests to the optimal model based on task classification:

Task Categories:
  code     → Coder model (Qwen2.5-Coder-7B or similar)
  reason   → Reasoning model (DeepSeek-R1-Distill or similar)
  chat     → General model (Qwen2.5-14B or similar)
  vision   → Vision model (Qwen2.5-VL or similar)
  fast     → Smallest available model for quick responses
  embed    → Embedding model (nomic-embed-text or similar)

Router logic:
  1. Classify task from prompt
  2. Check health of preferred model
  3. Fallback to next-best if unavailable
  4.

homelab-cluster

Install via CLI (Recommended)

Homelab Cluster Management

When to Use

Architecture Pattern

Health Monitoring

Expert MoE Routing

Metadata