FitLLM

RTX 3090 vs RTX 4080 SUPER for local LLMs

RTX 3090: 3/5 models · RTX 4080 SUPER: 1/5 (at ~4-bit, 8K)

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.

FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.

The two cards

RTX 3090RTX 4080 SUPER
VRAM24 GB16 GB
Memory bandwidth (speed, not estimated)936 GB/s736 GB/s

What each runs (~4-bit, max context that fits)

ModelRTX 3090RTX 4080 SUPER
Qwen 3.6 35B-A3Bwon't fit · 24.8/24 GBwon't fit · 24.8/16 GB
Qwen 3.6 27Bup to 34K · 20.2/24 GBwon't fit · 20.2/16 GB
Gemma 4 31b⚠️ up to 3K · 23.5/24 GBwon't fit · 23.5/16 GB
Gemma 4 26b A4Bup to 83K · 18.9/24 GBwon't fit · 18.9/16 GB
Gemma 4 12bup to 262K · 10.4/24 GBup to 110K · 10.4/16 GB

Only the RTX 3090 runs: Qwen 3.6 27B, Gemma 4 26b A4B.

Bottom line

For local LLMs, more VRAM means more models and longer context. Match the card to the model you actually want to run — see the per-model fit pages.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home