FitLLM

RTX 5080 vs RTX 4080 SUPER for local LLMs

Same 16 GB VRAM → identical model fit (the difference is speed, which FitLLM doesn't estimate)

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.

FitLLM compares fit — what loads in memory — computed from official config.json. These are a floor, not a guarantee; speed and power are not estimated.

The two cards

RTX 5080RTX 4080 SUPER
VRAM16 GB16 GB
Memory bandwidth (speed, not estimated)960 GB/s736 GB/s

What each runs (~4-bit, max context that fits)

ModelRTX 5080RTX 4080 SUPER
Qwen 3.6 35B-A3Bwon't fit · 24.8/16 GBwon't fit · 24.8/16 GB
Qwen 3.6 27Bwon't fit · 20.2/16 GBwon't fit · 20.2/16 GB
Gemma 4 31bwon't fit · 23.5/16 GBwon't fit · 23.5/16 GB
Gemma 4 26b A4Bwon't fit · 18.9/16 GBwon't fit · 18.9/16 GB
Gemma 4 12bup to 110K · 10.4/16 GBup to 110K · 10.4/16 GB

For model fit, the RTX 5080 and RTX 4080 SUPER are interchangeable — same VRAM, same verdicts. The faster card wins on memory bandwidth and power, not capacity. FitLLM only claims fit, not speed.

Bottom line

For local LLMs these two are interchangeable on capacity — choose on speed, price and power, not on what fits.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home