Hacker News

I run qwen models on MBA M4 16 Gb and MBP M2 Max 32 Gb, MBA is able to handle models in accordance with its vram memory capacity (with external cooling), e.g. qwen3 embedding 8B (not 1B!) but inference is 4x-6x times slower than on mbp. I suspect weaker SoC

Anyway, Apple SoC in M series is a huge leverage thanks to shared memory: VRAM size == RAM size so if you buy M chip with 128+ Gb memory, you’re pretty much able to run SOTA models locally, and price is significantly lower than AI GPU cards