Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.
| RTX 4090 | RTX 5080 | |
|---|---|---|
| VRAM | 24 GB | 16 GB |
| Memory bandwidth (speed, not estimated) | 1008 GB/s | 960 GB/s |
| Model | RTX 4090 | RTX 5080 |
|---|---|---|
| Qwen 3.6 35B-A3B | ❌ won't fit · 24.8/24 GB | ❌ won't fit · 24.8/16 GB |
| Qwen 3.6 27B | ✅ up to 34K · 20.2/24 GB | ❌ won't fit · 20.2/16 GB |
| Gemma 4 31b | ⚠️ up to 3K · 23.5/24 GB | ❌ won't fit · 23.5/16 GB |
| Gemma 4 26b A4B | ✅ up to 83K · 18.9/24 GB | ❌ won't fit · 18.9/16 GB |
| Gemma 4 12b | ✅ up to 262K · 10.4/24 GB | ✅ up to 110K · 10.4/16 GB |
Only the RTX 4090 runs: Qwen 3.6 27B, Gemma 4 26b A4B.
For local LLMs, more VRAM means more models and longer context. Match the card to the model you actually want to run — see the per-model fit pages.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home