Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.
| RTX 5080 | RTX 4080 SUPER | |
|---|---|---|
| VRAM | 16 GB | 16 GB |
| Memory bandwidth (speed, not estimated) | 960 GB/s | 736 GB/s |
| Model | RTX 5080 | RTX 4080 SUPER |
|---|---|---|
| Qwen 3.6 35B-A3B | ❌ won't fit · 24.8/16 GB | ❌ won't fit · 24.8/16 GB |
| Qwen 3.6 27B | ❌ won't fit · 20.2/16 GB | ❌ won't fit · 20.2/16 GB |
| Gemma 4 31b | ❌ won't fit · 23.5/16 GB | ❌ won't fit · 23.5/16 GB |
| Gemma 4 26b A4B | ❌ won't fit · 18.9/16 GB | ❌ won't fit · 18.9/16 GB |
| Gemma 4 12b | ✅ up to 110K · 10.4/16 GB | ✅ up to 110K · 10.4/16 GB |
For model fit, the RTX 5080 and RTX 4080 SUPER are interchangeable — same VRAM, same verdicts. The faster card wins on memory bandwidth and power, not capacity. FitLLM only claims fit, not speed.
For local LLMs these two are interchangeable on capacity — choose on speed, price and power, not on what fits.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home