Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
| Model weights | 32.6 GB |
| KV cache | 0.6 GB |
| Runtime + macOS | 7.0 GB |
| Total used | 46.2 / 48 GB |
| Free | 1.8 GB |
Max context at 8-bit: does not fit. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.
| Quant | Weights | Fits (KV F16) | Used @32K |
|---|---|---|---|
| 4bit | 16.3 GB | ✅ up to 234K ctx | 28.0 / 48 GB |
| 8bit | 32.6 GB | ⚠️ won't fit | 46.2 / 48 GB |
| 16bit | 65.2 GB | ❌ won't fit | 82.7 / 48 GB |
Qwen 3.6 35B-A3B is a hybrid model: only 10 of its 40 layers use full attention — the rest are linear and keep no growing KV cache. Naive calculators count every layer at full context and badly over-estimate.
same Mac Models that fit in 48GB: Qwen 3.6 27B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B.
Open math: fitllm-engine (MIT), from official config.json.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home