Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.
| RTX 5090 | RTX 5080 | |
|---|---|---|
| VRAM | 32 GB | 16 GB |
| Memory bandwidth (speed, not estimated) | 1792 GB/s | 960 GB/s |
| Model | RTX 5090 | RTX 5080 |
|---|---|---|
| Qwen 3.6 35B-A3B | ✅ up to 117K · 24.8/32 GB | ❌ won't fit · 24.8/16 GB |
| Qwen 3.6 27B | ✅ up to 110K · 20.2/32 GB | ❌ won't fit · 20.2/16 GB |
| Gemma 4 31b | ✅ up to 67K · 23.5/32 GB | ❌ won't fit · 23.5/16 GB |
| Gemma 4 26b A4B | ✅ up to 229K · 18.9/32 GB | ❌ won't fit · 18.9/16 GB |
| Gemma 4 12b | ✅ up to 262K · 10.4/32 GB | ✅ up to 110K · 10.4/16 GB |
Only the RTX 5090 runs: Qwen 3.6 35B-A3B, Qwen 3.6 27B, Gemma 4 31b, Gemma 4 26b A4B.
For local LLMs, more VRAM means more models and longer context. Match the card to the model you actually want to run — see the per-model fit pages.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home