Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
Ranked by memory size (a proxy for cost and availability), not price. Every figure is computed by the engine — these are a floor, not a guarantee; leave headroom for your runtime and OS.
~4-bit ≈ Q4_K_M (GGUF, llama.cpp) on GPU / 4-bit (MLX) on Mac · ~8-bit ≈ Q8_0 / 8-bit · KV cache F16 · "full" = the model's max context (262K).
| Setup | Smallest GPU | Smallest Mac |
|---|---|---|
| ~4-bit · 8K ctx | ✅ RTX 3090 · 20.2/24 GB | ✅ M5 Pro 48GB · 23.0/48 GB |
| ~4-bit · full (262K) | — 🔴 | ✅ M5 Max 64GB · 48.5/64 GB |
| ~8-bit · 33K ctx | — 🔴 | ✅ M5 Max 64GB · 39.7/64 GB |
| ~8-bit · full (262K) | — 🔴 | ✅ M5 Max 128GB · 62.6/128 GB |
| Hardware | Memory | Max context (~4-bit) | Used @8K |
|---|---|---|---|
| RTX 3060 12GB | 12 GB | ❌ won't fit | 20.2 / 12 GB |
| RTX 4080 SUPER | 16 GB | ❌ won't fit | 20.2 / 16 GB |
| RTX 5080 | 16 GB | ❌ won't fit | 20.2 / 16 GB |
| RTX 3090 | 24 GB | ✅ up to 34K | 20.2 / 24 GB |
| RTX 4090 | 24 GB | ✅ up to 34K | 20.2 / 24 GB |
| RTX 5090 | 32 GB | ✅ up to 110K | 20.2 / 32 GB |
| M5 Pro 48GB | 48 GB | ✅ up to 162K | 23.0 / 48 GB |
| M5 Max 64GB | 64 GB | ✅ up to 262K | 23.0 / 64 GB |
| M5 Max 128GB | 128 GB | ✅ up to 262K | 23.0 / 128 GB |
Qwen 3.6 27B is a hybrid model: only 16 of its 64 layers use full attention — the rest are linear and keep no growing KV cache. Naive calculators count every layer at full context and badly over-estimate. So Qwen 3.6 27B has no single fixed memory requirement — it shifts with quantization and context. See the full breakdown.
▶ Open the interactive calculatorAll numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home