Can I run Qwen 3.6 35B-A3B on a M5 Pro 48GB Mac?

Q: Can I run Qwen 3.6 35B-A3B on a M5 Pro 48GB Mac?

Yes — Qwen 3.6 35B-A3B fits in 48 GB of M5 Pro unified memory at 8-bit, with up to about 0 tokens of context. Computed from the model's official config.json by the open-source FitLLM engine.

⚠️ Tight — it just fits — at 8-bit

Computed with the open FitLLM engine — accurate per-layer KV-cache modeling, not a naive estimate. Updated 2026-07-16.

Memory breakdown (8-bit, F16 KV, 33K context)

Model weights	32.6 GB
KV cache	0.6 GB
Runtime + macOS	13.0 GB
Total used	46.2 / 48 GB
Free	1.8 GB

Max context at 8-bit: does not fit. Unified memory is shared by the OS — FitLLM leaves ~20% headroom.

Every quantization on M5 Pro 48GB

Quant	Weights	Fits (KV F16)	Used @32K
4bit	16.3 GB	✅ up to 234K ctx	28.0 / 48 GB
8bit	32.6 GB	⚠️ won't fit	46.2 / 48 GB
16bit	65.2 GB	❌ won't fit	82.7 / 48 GB

Lower quants free memory at some output-quality cost — 4-bit is the common sweet spot for local use.

▶ Open the interactive calculator (this exact setup)

Embed this verdict

Live badge for your README or model card — recomputed by the engine, never stale:

[![fits: Qwen 3.6 35B-A3B on M5 Pro 48GB Mac](https://img.shields.io/endpoint?url=https%3A%2F%2Ffitllm.run%2Fapi%2Fbadge%3Fmodel%3DQwen%25203.6%252035B-A3B%26ram%3D48%26quant%3D8)](https://fitllm.run/can-i-run/qwen-3-6-35b-a3b-on-m5-pro-48gb)

fit badge preview ← renders like this, live.

Or from your terminal (exit 0/1 — works as a pre-download guard):

npx fitllm "Qwen 3.6 35B-A3B" --mac 48

Why most calculators get this wrong

Qwen 3.6 35B-A3B is a hybrid model: only 10 of its 40 layers use full attention — the rest are linear and keep no growing KV cache. Naive calculators count every layer at full context and badly over-estimate.

Other options

same Mac Models that fit in 48GB: gpt-oss-20b, Qwen 3.6 27B, Gemma 4 e2b, Gemma 4 e4b, Gemma 4 12b, Gemma 4 26b A4B, Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, MiniCPM5-1B, Qwen3-0.6B, Qwen3-1.7B, Llama-3.2-1B-Instruct, Gemma-3-1B-it.

Reproduce it

Open math: fitllm-engine (MIT), from official config.json.

All numbers are computed by the open-source fitllm-engine (MIT) from official model config.json values — reproduce or audit them yourself. Estimates; real usage varies with runtime (llama.cpp / MLX / Ollama), driver and display. Found a mismatch? Report it. · FitLLM home