Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.
| Model weights | 25.3 GB |
| KV cache | 2.0 GB |
| Runtime + macOS | 6.3 GB |
| Total used | 39.7 / 48 GB |
| Free | 8.3 GB |
Max context at 8-bit: ~20K tokens. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.
| Quant | Weights | Fits (KV F16) | Used @32K |
|---|---|---|---|
| 4bit | 12.7 GB | ✅ up to 162K ctx | 25.5 / 48 GB |
| 8bit | 25.3 GB | ⚠️ up to 20K ctx | 39.7 / 48 GB |
| 16bit | 50.7 GB | ❌ won't fit | 68.0 / 48 GB |
Qwen 3.6 27B is a hybrid model: only 16 of its 64 layers use full attention — the rest are linear and keep no growing KV cache. Naive calculators count every layer at full context and badly over-estimate.
same Mac Models that fit in 48GB: Qwen 3.6 27B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B.
Open math: fitllm-engine (MIT), from official config.json.
All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home