Official Verified

local-first-llm

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs. Tracks token savings and cost avoidance in a persistent dashboard. Use when: (1) user asks to run a task with a local model first, (2) user wants to reduce cloud API costs or keep requests private, (3) user asks to see their token savings or LLM routing dashboard, (4) any request where local-vs-cloud routing should be decided automatically. Supports Ollama, LM Studio, and llamafile as local providers.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/joelnishanth/local-first-llm

Download Source Code (.zip)

Local-First LLM

Route requests to a local LLM first; fall back to cloud only when necessary. Track every decision to show real token and cost savings.

Quick Start

1. Check if a local LLM is running

python3 skills/local-first-llm/scripts/check_local.py

Returns JSON: { "any_available": true, "best": { "provider": "ollama", "models": [...] } }

2. Route a request

python3 skills/local-first-llm/scripts/route_request.py \
  --prompt "Summarize this meeting transcript" \
  --tokens 800 \
  --local-available \
  --local-provider ollama

Returns: { "decision": "local", "reason": "...", "complexity_score": -1 }

3. Log the outcome

After executing the request, record it:

python3 skills/local-first-llm/scripts/track_savings.py log \
  --tokens 800 \
  --model gpt-4o \
  --routed-to local

4. Show the dashboard

python3 skills/local-first-llm/scripts/dashboard.py

Full Routing Workflow

┌─────────────────────────────────────────────────────┐
│  1. check_local.py  →  is a local provider running? │
│                                                      │
│  2. route_request.py  →  local or cloud?             │
│     - sensitivity check  (private data → local)      │
│     - complexity score   (high score → cloud)        │
│     - availability gate  (no local → cloud)          │
│                                                      │
│  3. Execute with the chosen provider                 │
│                                                      │
│  4. track_savings.py log  →  record the outcome      │
│                                                      │
│  5. dashboard.py  →  show cumulative savings         │
└─────────────────────────────────────────────────────┘

Routing Rules (Summary)

Condition	Route
No local provider available	☁️ Cloud
Prompt contains sensitive data (`password`, `secret`, `api key`, `ssn`, etc.)	🏠 Local
Complexity score ≥ 3	☁️ Cloud
Complexity score < 3	🏠 Local

For full scoring details, see references/routing-logic.md.

Executing with a Local Provider

Once route_request.py returns "decision": "local", send the request:

Ollama

curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "YOUR_PROMPT", "stream": false}'

LM Studio / llamafile (OpenAI-compatible)

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}'

Dashboard

Read Full Documentation on GitHub

Metadata

Author@joelnishanth

Stars1947

Updated2026-03-04

View Author Profile

AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill

Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-joelnishanth-local-first-llm": {
      "enabled": true,
      "auto_update": true
    }
  }
}

Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.

Related Skills

adaptive-routing

Routes LLM requests to a local model first (Ollama, LM Studio, llamafile), validates the response quality, and escalates to cloud only when the local result fails. Tracks local vs escalated vs cloud outcomes in a persistent dashboard. Use when: (1) user asks to run a task with a local model first, (2) user wants to reduce cloud API costs or keep requests private, (3) user wants post-outcome quality validation before committing to a local result, (4) user asks to see token savings or the routing dashboard, (5) any request where local-vs-cloud routing should be decided automatically with a quality gate. Supports Ollama, LM Studio, and llamafile as local providers.

joelnishanth 1947