FitLLM

RTX 4090 vs RTX 3090 for local LLMs

Same 24 GB VRAM → identical model fit (the difference is speed, which FitLLM doesn't estimate)

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.

FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.

The two cards

RTX 4090RTX 3090
VRAM24 GB24 GB
Memory bandwidth (speed, not estimated)1008 GB/s936 GB/s

What each runs (~4-bit, max context that fits)

ModelRTX 4090RTX 3090
Qwen 3.6 35B-A3Bwon't fit · 24.8/24 GBwon't fit · 24.8/24 GB
Qwen 3.6 27Bup to 34K · 20.2/24 GBup to 34K · 20.2/24 GB
Gemma 4 31b⚠️ up to 3K · 23.5/24 GB⚠️ up to 3K · 23.5/24 GB
Gemma 4 26b A4Bup to 83K · 18.9/24 GBup to 83K · 18.9/24 GB
Gemma 4 12bup to 262K · 10.4/24 GBup to 262K · 10.4/24 GB

For model fit, the RTX 4090 and RTX 3090 are interchangeable — same VRAM, same verdicts. The faster card wins on memory bandwidth and power, not capacity. FitLLM only claims fit, not speed.

Bottom line

For local LLMs these two are interchangeable on capacity — choose on speed, price and power, not on what fits.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home