Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.
| RTX 4090 | RTX 3090 | |
|---|---|---|
| VRAM | 24 GB | 24 GB |
| Memory bandwidth (speed, not estimated) | 1008 GB/s | 936 GB/s |
| Model | RTX 4090 | RTX 3090 |
|---|---|---|
| Qwen 3.6 35B-A3B | ❌ won't fit · 24.8/24 GB | ❌ won't fit · 24.8/24 GB |
| Qwen 3.6 27B | ✅ up to 34K · 20.2/24 GB | ✅ up to 34K · 20.2/24 GB |
| Gemma 4 31b | ⚠️ up to 3K · 23.5/24 GB | ⚠️ up to 3K · 23.5/24 GB |
| Gemma 4 26b A4B | ✅ up to 83K · 18.9/24 GB | ✅ up to 83K · 18.9/24 GB |
| Gemma 4 12b | ✅ up to 262K · 10.4/24 GB | ✅ up to 262K · 10.4/24 GB |
For model fit, the RTX 4090 and RTX 3090 are interchangeable — same VRAM, same verdicts. The faster card wins on memory bandwidth and power, not capacity. FitLLM only claims fit, not speed.
For local LLMs these two are interchangeable on capacity — choose on speed, price and power, not on what fits.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home