llamacpp-bench

Run standardized benchmarks on GGUF models using llama.cpp's llama-bench tool.

Quick Start

# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

Benchmark Parameters

Parameter	Description	Default
`-m`	Model path (GGUF file)	required
`-p`	Prompt sizes to test	512
`-n`	Generation lengths to test	128
`-ngl`	GPU layers to offload	99
`-t`	CPU threads	auto
`-dev`	Device selection	auto

Standard Test Suite

For consistent comparisons across models, use:

-p 512,1024,2048 -n 128,256 -ngl 99

This tests:

Prompt processing: 512, 1024, 2048 tokens
Token generation: 128, 256 tokens

Interpreting Results

Metric	Meaning	Good Performance
`pp512`	Prompt processing speed at 512 tokens	>1000 t/s
`pp1024`	Prompt processing speed at 1024 tokens	>1000 t/s
`pp2048`	Prompt processing speed at 2048 tokens	>1000 t/s
`tg128`	Token generation speed (128 tokens)	>50 t/s
`tg256`	Token generation speed (256 tokens)	>50 t/s

Backend Selection

llama-bench auto-detects available backends. Priority order:

CUDA (NVIDIA GPUs)
ROCm (AMD GPUs)
Vulkan (cross-platform GPU)
CPU (fallback)

To force a backend, set environment variable or check build:

# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"

Batch Benchmarking

Use the provided script for benchmarking multiple models:

./scripts/benchmark_models.sh /path/to/models/*.gguf

Saving Results

Output can be redirected to a file:

llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt

Or use the benchmark script which auto-saves to timestamped files.

Common Issues

Out of memory: Reduce -ngl (GPU layers) or test smaller prompt sizes
Slow CPU performance: Ensure -t matches CPU core count
Backend not found: Check llama.cpp was built with the desired backend

Building / Updating llama.cpp

Check Current Version

./scripts/build_llamacpp.sh -v

Shows:

Current Git commit and branch
Build date
Whether behind upstream
Available backends

Build or Update

# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u

# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan   # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda     # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm     # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu      # CPU only

# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan

# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path

llamacpp-bench

Install via CLI (Recommended)

llamacpp-bench

Quick Start

Benchmark Parameters

Standard Test Suite

Interpreting Results

Backend Selection

Batch Benchmarking

Saving Results

Common Issues

Building / Updating llama.cpp

Check Current Version

Build or Update

Build Options

Metadata

Related Skills

rocm_vllm_deployment