ClawKit Logo
ClawKitReliability Toolkit
Back to Registry
Official Verified

llamacpp-bench

Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM models, compare model performance, test inference speed, or run llama-bench on GGUF files. Supports Vulkan, CUDA, ROCm, and CPU backends.

skill-install — Terminal

Install via CLI (Recommended)

clawhub install openclaw/skills/skills/alexhegit/llamacpp-bench
Or

llamacpp-bench

Run standardized benchmarks on GGUF models using llama.cpp's llama-bench tool.

Quick Start

# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

Benchmark Parameters

ParameterDescriptionDefault
-mModel path (GGUF file)required
-pPrompt sizes to test512
-nGeneration lengths to test128
-nglGPU layers to offload99
-tCPU threadsauto
-devDevice selectionauto

Standard Test Suite

For consistent comparisons across models, use:

-p 512,1024,2048 -n 128,256 -ngl 99

This tests:

  • Prompt processing: 512, 1024, 2048 tokens
  • Token generation: 128, 256 tokens

Interpreting Results

MetricMeaningGood Performance
pp512Prompt processing speed at 512 tokens>1000 t/s
pp1024Prompt processing speed at 1024 tokens>1000 t/s
pp2048Prompt processing speed at 2048 tokens>1000 t/s
tg128Token generation speed (128 tokens)>50 t/s
tg256Token generation speed (256 tokens)>50 t/s

Backend Selection

llama-bench auto-detects available backends. Priority order:

  1. CUDA (NVIDIA GPUs)
  2. ROCm (AMD GPUs)
  3. Vulkan (cross-platform GPU)
  4. CPU (fallback)

To force a backend, set environment variable or check build:

# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"

Batch Benchmarking

Use the provided script for benchmarking multiple models:

./scripts/benchmark_models.sh /path/to/models/*.gguf

Saving Results

Output can be redirected to a file:

llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt

Or use the benchmark script which auto-saves to timestamped files.

Common Issues

  1. Out of memory: Reduce -ngl (GPU layers) or test smaller prompt sizes
  2. Slow CPU performance: Ensure -t matches CPU core count
  3. Backend not found: Check llama.cpp was built with the desired backend

Building / Updating llama.cpp

Check Current Version

./scripts/build_llamacpp.sh -v

Shows:

  • Current Git commit and branch
  • Build date
  • Whether behind upstream
  • Available backends

Build or Update

# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u

# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan   # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda     # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm     # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu      # CPU only

# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan

# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path

Build Options

Metadata

Author@alexhegit
Stars4473
Views1
Updated2026-05-01
View Author Profile
AI Skill Finder

Not sure this is the right skill?

Describe what you want to build — we'll match you to the best skill from 16,000+ options.

Find the right skill
Add to Configuration

Paste this into your clawhub.json to enable this plugin.

{
  "plugins": {
    "official-alexhegit-llamacpp-bench": {
      "enabled": true,
      "auto_update": true
    }
  }
}
Safety NoteClawKit audits metadata but not runtime behavior. Use with caution.