Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.
| RTX 5090 | RTX 4090 | |
|---|---|---|
| VRAM | 32 GB | 24 GB |
| Memory bandwidth (speed, not estimated) | 1792 GB/s | 1008 GB/s |
| Model | RTX 5090 | RTX 4090 |
|---|---|---|
| Qwen 3.6 35B-A3B | ✅ up to 117K · 24.8/32 GB | ❌ won't fit · 24.8/24 GB |
| Qwen 3.6 27B | ✅ up to 110K · 20.2/32 GB | ✅ up to 34K · 20.2/24 GB |
| Gemma 4 31b | ✅ up to 67K · 23.5/32 GB | ⚠️ up to 3K · 23.5/24 GB |
| Gemma 4 26b A4B | ✅ up to 229K · 18.9/32 GB | ✅ up to 83K · 18.9/24 GB |
| Gemma 4 12b | ✅ up to 262K · 10.4/32 GB | ✅ up to 262K · 10.4/24 GB |
Only the RTX 5090 runs: Qwen 3.6 35B-A3B, Gemma 4 31b.
For local LLMs, more VRAM means more models and longer context. Match the card to the model you actually want to run — see the per-model fit pages.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home