FitLLM

Can I run Qwen 3.6 27B on a M5 Pro 48GB Mac?

⚠️ Tight — it just fits — up to ~20K tokens at 8-bit

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.

Memory breakdown (8-bit, F16 KV, 33K context)

Model weights25.3 GB
KV cache2.0 GB
Runtime + macOS6.3 GB
Total used39.7 / 48 GB
Free8.3 GB

Max context at 8-bit: ~20K tokens. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.

Every quantization on M5 Pro 48GB

QuantWeightsFits (KV F16)Used @32K
4bit12.7 GB✅ up to 162K ctx25.5 / 48 GB
8bit25.3 GB⚠️ up to 20K ctx39.7 / 48 GB
16bit50.7 GB❌ won't fit68.0 / 48 GB
▶ Open the interactive calculator (this exact setup)

Why most calculators get this wrong

Qwen 3.6 27B is a hybrid model: only 16 of its 64 layers use full attention — the rest are linear and keep no growing KV cache. Naive calculators count every layer at full context and badly over-estimate.

Other options

same Mac Models that fit in 48GB: Qwen 3.6 27B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B.

Reproduce it

Open math: fitllm-engine (MIT), from official config.json.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home