tandemn-tuna
Deploy and serve LLM models on GPU. Compare GPU pricing. Launch vLLM on Modal, RunPod, Cerebrium, Cloud Run, Baseten, or Azure with spot instance fallback. OpenAI-compatible inference endpoint.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/choprahetarth/tandemn-tunaTuna — Deploy and Serve LLM Models on GPU Infrastructure
Tuna is a hybrid GPU inference orchestrator. It lets you deploy, serve, and manage LLM models (Llama, Qwen, Mistral, DeepSeek, Gemma, and any HuggingFace model) on serverless GPUs from Modal, RunPod, Cerebrium, Google Cloud Run, Baseten, or Azure Container Apps, with optional spot instance fallback on AWS via SkyPilot. Every deployment gets an OpenAI-compatible /v1/chat/completions endpoint.
The key idea: serverless GPUs handle requests immediately (fast cold start, pay-per-second) while a cheaper spot GPU boots in the background. Once spot is ready, traffic shifts there. If spot gets preempted, traffic falls back to serverless automatically. This gives you 3–5x cost savings over pure serverless with zero downtime.
Quick Start — Deploy a Model in 3 Commands
# 1. Install tuna
uv pip install tandemn-tuna
# 2. Deploy a model (auto-picks cheapest serverless provider for the GPU)
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --service-name my-llm
# 3. Query your endpoint (shown in deploy output)
curl http://<router-ip>:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello!"}]}'
For serverless-only (no spot, no AWS needed):
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --serverless-only
All Commands
tuna deploy — Launch a model on GPU
Deploy a model across serverless + spot infrastructure. This is the main command.
tuna deploy --model <HuggingFace-model-ID> --gpu <GPU> [options]
Required arguments:
--model— HuggingFace model ID (e.g.,Qwen/Qwen3-0.6B,meta-llama/Llama-3-70b)--gpu— GPU type (e.g.,T4,L4,L40S,A100,H100,B200)
Common options:
--service-name— Name for the deployment (auto-generated if omitted)--serverless-provider— Force a specific provider:modal,runpod,cloudrun,baseten,azure,cerebrium(default: cheapest available)--serverless-only— Serverless only, no spot backend or router (no AWS needed)--gpu-count— Number of GPUs (default: 1)--tp-size— Tensor parallel size (default: 1)--max-model-len— Max sequence length (default: 4096)--spots-cloud— Cloud for spot GPUs:awsorazure(default:aws)--region— Cloud region for spot instances--concurrency— Override serverless concurrency limit--no-scale-to-zero— Keep at least 1 spot replica running--public— Make endpoint publicly accessible (no auth)--scaling-policy— Path to YAML with scaling parameters
Provider-specific options:
--gcp-project,--gcp-region— For Cloud Run--azure-subscription,--azure-resource-group,--azure-region,--azure-environment— For Azure
Examples:
# Deploy Llama 3 on Modal with hybrid spot
tuna deploy --model meta-llama/Llama-3-8b --gpu A100 --serverless-provider modal
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-choprahetarth-tandemn-tuna": {
"enabled": true,
"auto_update": true
}
}
}