ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

tandemn-tuna

Deploy and serve LLM models on GPU. Compare GPU pricing. Launch vLLM on Modal, RunPod, Cerebrium, Cloud Run, Baseten, or Azure with spot instance fallback. OpenAI-compatible inference endpoint.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/choprahetarth/tandemn-tuna
Or

Tuna — Deploy and Serve LLM Models on GPU Infrastructure

Tuna is a hybrid GPU inference orchestrator. It lets you deploy, serve, and manage LLM models (Llama, Qwen, Mistral, DeepSeek, Gemma, and any HuggingFace model) on serverless GPUs from Modal, RunPod, Cerebrium, Google Cloud Run, Baseten, or Azure Container Apps, with optional spot instance fallback on AWS via SkyPilot. Every deployment gets an OpenAI-compatible /v1/chat/completions endpoint.

The key idea: serverless GPUs handle requests immediately (fast cold start, pay-per-second) while a cheaper spot GPU boots in the background. Once spot is ready, traffic shifts there. If spot gets preempted, traffic falls back to serverless automatically. This gives you 3–5x cost savings over pure serverless with zero downtime.

Quick Start — Deploy a Model in 3 Commands

# 1. Install tuna
uv pip install tandemn-tuna

# 2. Deploy a model (auto-picks cheapest serverless provider for the GPU)
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --service-name my-llm

# 3. Query your endpoint (shown in deploy output)
curl http://<router-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello!"}]}'

For serverless-only (no spot, no AWS needed):

tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --serverless-only

All Commands

tuna deploy — Launch a model on GPU

Deploy a model across serverless + spot infrastructure. This is the main command.

tuna deploy --model <HuggingFace-model-ID> --gpu <GPU> [options]

Required arguments:

  • --model — HuggingFace model ID (e.g., Qwen/Qwen3-0.6B, meta-llama/Llama-3-70b)
  • --gpu — GPU type (e.g., T4, L4, L40S, A100, H100, B200)

Common options:

  • --service-name — Name for the deployment (auto-generated if omitted)
  • --serverless-provider — Force a specific provider: modal, runpod, cloudrun, baseten, azure, cerebrium (default: cheapest available)
  • --serverless-only — Serverless only, no spot backend or router (no AWS needed)
  • --gpu-count — Number of GPUs (default: 1)
  • --tp-size — Tensor parallel size (default: 1)
  • --max-model-len — Max sequence length (default: 4096)
  • --spots-cloud — Cloud for spot GPUs: aws or azure (default: aws)
  • --region — Cloud region for spot instances
  • --concurrency — Override serverless concurrency limit
  • --no-scale-to-zero — Keep at least 1 spot replica running
  • --public — Make endpoint publicly accessible (no auth)
  • --scaling-policy — Path to YAML with scaling parameters

Provider-specific options:

  • --gcp-project, --gcp-region — For Cloud Run
  • --azure-subscription, --azure-resource-group, --azure-region, --azure-environment — For Azure

Examples:

# Deploy Llama 3 on Modal with hybrid spot
tuna deploy --model meta-llama/Llama-3-8b --gpu A100 --serverless-provider modal

Metadata

Stars3683
Views0
Updated2026-04-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-choprahetarth-tandemn-tuna": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.