FitLLM

Can I run Gemma 4 26b A4B on a M5 Pro 48GB Mac?

✅ Yes — it fits — up to ~74K tokens at 8-bit

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-06.

Memory breakdown (8-bit, F16 KV, 33K context)

Model weights23.5 GB
KV cache0.8 GB
Runtime + macOS5.9 GB
Total used36.3 / 48 GB
Free11.7 GB

Max context at 8-bit: ~74K tokens. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.

Every quantization on M5 Pro 48GB

QuantWeightsFits (KV F16)Used @32K
4bit14.7 GB✅ up to 262K ctx26.4 / 48 GB
8bit23.5 GB✅ up to 74K ctx36.3 / 48 GB
16bit45.1 GB❌ won't fit60.5 / 48 GB
▶ Open the interactive calculator (this exact setup)

Why most calculators get this wrong

Gemma 4 26b A4B interleaves sliding-window (local) and global attention 5:1. The local layers cap their KV cache at the 1024-token window, and the global layers use a different head shape (head_dim 512 vs 256). A naive "all layers × full context × one head_dim" formula over-counts KV cache by several times.

Other options

same Mac Models that fit in 48GB: Qwen 3.6 27B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B.

Reproduce it

Open math: fitllm-engine (MIT), from official config.json.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home