Can I run Gemma 4 31b on a M5 Max 64GB Mac?

Q: Can I run Gemma 4 31b on a M5 Max 64GB Mac?

Yes — Gemma 4 31b fits in 64 GB of M5 Max unified memory at 8-bit, with up to about 60K tokens of context. Computed from the model's official config.json by the open-source FitLLM engine.

✅ Yes — it fits — up to ~60K tokens at 8-bit

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-07-16.

Memory breakdown (8-bit, F16 KV, 33K context)

Model weights	31.5 GB
KV cache	3.3 GB
Runtime + macOS	13.2 GB
Total used	48.0 / 64 GB
Free	16.0 GB

Max context at 8-bit: ~60K tokens. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.

Every quantization on M5 Max 64GB

Quant	Weights	Fits (KV F16)	Used @32K
4bit	17.2 GB	✅ up to 196K ctx	32.0 / 64 GB
8bit	31.5 GB	✅ up to 60K ctx	48.0 / 64 GB
16bit	58.3 GB	❌ won't fit	78.1 / 64 GB

Lower quants free memory at some output-quality cost — 4-bit is the common sweet spot for local use.

▶ Open the interactive calculator (this exact setup)

Embed this verdict

Live badge for your README or model card — recomputed by the engine, never stale:

[![fits: Gemma 4 31b on M5 Max 64GB Mac](https://img.shields.io/endpoint?url=https%3A%2F%2Ffitllm.run%2Fapi%2Fbadge%3Fmodel%3DGemma%25204%252031b%26ram%3D64%26quant%3D8)](https://fitllm.run/can-i-run/gemma-4-31b-on-m5-max-64gb)

fit badge preview ← renders like this, live.

Or from your terminal (exit 0/1 — works as a pre-download guard):

npx fitllm "Gemma 4 31b" --mac 64

Why most calculators get this wrong

Gemma 4 31b interleaves sliding-window (local) and global attention 5:1. The local layers cap their KV cache at the 1024-token window, and the global layers use a different head shape (head_dim 512 vs 256). A naive "all layers × full context × one head_dim" formula over-counts KV cache by several times.

Other options

same Mac Models that fit in 64GB: GLM-4.7-Flash, gpt-oss-20b, Qwen 3.6 27B, Qwen 3.6 35B-A3B, Qwen-AgentWorld-35B-A3B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B, Gemma 4 31b, Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, MiniCPM5-1B, Qwen3-0.6B, Qwen3-1.7B, Llama-3.2-1B-Instruct, Gemma-3-1B-it.

Reproduce it

Open math: fitllm-engine (MIT), from official config.json.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home