FitLLM

Why naive VRAM calculators are wrong on modern LLMs

Gemma 4 31B real KV = 20.78 GiB — naive calculators report ~11.1× more

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.

The shortcut that used to work

For years, "VRAM needed = parameters × bytes-per-param + a flat KV estimate" was close enough. On 2025–2026 models it's wrong by multiples, because the architectures broke the assumptions behind that formula. Four errors — with the receipts.

1. Sliding-window attention caps the KV cache

Gemma 4 interleaves 5 sliding-window (local) layers : 1 global layer. Local layers only attend to the last ~1024 tokens, so their KV cache stops growing past the window. Counting all layers at full context over-estimates KV by ~6×.

2. Hybrid / linear attention has no growing KV

Qwen 3.6 is hybrid: only a fraction of layers use full attention; the rest are linear and keep no per-token KV cache. A calculator that counts every layer at full context is counting cache that doesn't exist.

3. Global layers use a different head_dim

In Gemma 4 the global-attention layers use head_dim 512 with fewer KV heads, while sliding layers use 256. A single uniform head_dim is wrong for both kinds of layer at once.

4. MoE: total params resident, active params for compute

Qwen 3.6 35B-A3B activates ~3B params per token but all 35B must sit in memory. Size memory off active params and you'll think it fits when it doesn't; size KV off all layers and you'll think it won't when it does.

The receipts

Audit it yourself

The whole engine is one readable MIT file: fitllm-engine. Every number on this site is computed from official model config.json values — no guessing. See the per-model fit pages or run the calculator:

▶ Open the FitLLM calculator

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home