Can I run Qwen 3.6 35B-A3B on a M5 Max 128GB Mac?

Q: Can I run Qwen 3.6 35B-A3B on a M5 Max 128GB Mac?

Yes — Qwen 3.6 35B-A3B fits in 128 GB of M5 Max unified memory at 8-bit, with up to about 262K tokens of context. Computed from the model's official config.json by the open-source FitLLM engine.

✅ Yes — it fits — up to ~262K tokens at 8-bit

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-07-16.

Memory breakdown (8-bit, F16 KV, 33K context)

Model weights	32.6 GB
KV cache	0.6 GB
Runtime + macOS	13.0 GB
Total used	46.2 / 128 GB
Free	81.8 GB

Max context at 8-bit: ~262K tokens. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.

Every quantization on M5 Max 128GB

Quant	Weights	Fits (KV F16)	Used @32K
4bit	16.3 GB	✅ up to 262K ctx	28.0 / 128 GB
8bit	32.6 GB	✅ up to 262K ctx	46.2 / 128 GB
16bit	65.2 GB	✅ up to 262K ctx	82.7 / 128 GB

Lower quants free memory at some output-quality cost — 4-bit is the common sweet spot for local use.

▶ Open the interactive calculator (this exact setup)

Embed this verdict

Live badge for your README or model card — recomputed by the engine, never stale:

[![fits: Qwen 3.6 35B-A3B on M5 Max 128GB Mac](https://img.shields.io/endpoint?url=https%3A%2F%2Ffitllm.run%2Fapi%2Fbadge%3Fmodel%3DQwen%25203.6%252035B-A3B%26ram%3D128%26quant%3D8)](https://fitllm.run/can-i-run/qwen-3-6-35b-a3b-on-m5-max-128gb)

fit badge preview ← renders like this, live.

Or from your terminal (exit 0/1 — works as a pre-download guard):

npx fitllm "Qwen 3.6 35B-A3B" --mac 128

Why most calculators get this wrong

Qwen 3.6 35B-A3B is a hybrid model: only 10 of its 40 layers use full attention — the rest are linear and keep no growing KV cache. Naive calculators count every layer at full context and badly over-estimate.

Other options

same Mac Models that fit in 128GB: GLM-4.7-Flash, gpt-oss-20b, Qwen 3.6 27B, Qwen 3.6 35B-A3B, Qwen-AgentWorld-35B-A3B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B, Gemma 4 31b, Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, MiniCPM5-1B, Qwen3-0.6B, Qwen3-1.7B, Llama-3.2-1B-Instruct, Gemma-3-1B-it.

Reproduce it

Open math: fitllm-engine (MIT), from official config.json.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home