ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

homelab-cluster

Manage multi-tier AI inference clusters for homelabs. Health monitoring, expert MoE routing, automatic node recovery, and model deployment across Ollama and llama.cpp nodes. Covers GPU memory planning, Docker volume strategies for large models, sequential startup patterns to avoid CUDA deadlocks, and unified API gateways via LiteLLM.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/mlesnews/homelab-cluster
Or

Homelab Cluster Management

Manage a compound AI compute cluster spanning multiple tiers of GPU and CPU inference nodes. Built and battle-tested by Lumina Homelab.

When to Use

Use this skill when your agent needs to:

  • Monitor health of distributed model endpoints
  • Route inference requests to the best available model
  • Recover downed nodes automatically
  • Plan GPU memory allocation across models
  • Deploy models across heterogeneous hardware

Architecture Pattern

A homelab cluster typically spans 2-3 tiers:

TierTypical HardwareRuntimeRole
LocalPrimary GPU (RTX 4090/5090)OllamaFast inference, embeddings
RemoteSecondary GPU (RTX 3090/4090)llama.cpp or OllamaDistributed inference
NAS/CPUSynology, RPi, any CPU nodeOllamaLightweight models, fallback

A LiteLLM proxy sits in front, providing a unified OpenAI-compatible API across all tiers.

Health Monitoring

Check all endpoints with configurable per-endpoint timeouts:

# Define endpoints with tier labels
ENDPOINTS = {
    "local/ollama": {"url": "http://localhost:11434/api/tags", "tier": "LOCAL"},
    "remote/mark-i": {"url": "http://REMOTE_IP:3009/v1/models", "tier": "REMOTE", "timeout": 8},
    "gateway/litellm": {"url": "http://localhost:8080/health/liveliness", "tier": "GATEWAY"},
}

# For each endpoint: GET with timeout, check HTTP 200
# Classify: HEALTHY / DEGRADED / DOWN per tier
# Overall prognosis based on tier health

Key lesson: Use /health/liveliness for LiteLLM, not /health — the latter probes all model routes and hangs if any are unreachable.

Expert MoE Routing

Route requests to the optimal model based on task classification:

Task Categories:
  code     → Coder model (Qwen2.5-Coder-7B or similar)
  reason   → Reasoning model (DeepSeek-R1-Distill or similar)
  chat     → General model (Qwen2.5-14B or similar)
  vision   → Vision model (Qwen2.5-VL or similar)
  fast     → Smallest available model for quick responses
  embed    → Embedding model (nomic-embed-text or similar)

Router logic:
  1. Classify task from prompt
  2. Check health of preferred model
  3. Fallback to next-best if unavailable
  4.

Metadata

Author@mlesnews
Stars1401
Views0
Updated2026-02-24
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-mlesnews-homelab-cluster": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.