I've also been running Qwen 3.6 35B A3b on my Windows laptop (64 GB RAM, a 4GB GPU) and it's at least tolerable. It's not fast - a few tokens per second, slower than reading speed - but I can give it a task and come back later. That was a $600 laptop off eBay a few years ago, not a $6,000 machine.
Are these unified memory Macs and giant 24GB desktop GPUs achieving dozens or hundreds of tokens per second commensurate with their 10x-20x cost?
35b A3b runs ~100 tokens a second on the best M5 Max gpu setup.
I got around 50-60 on my m3 max so 100tps seems very realistic for 2 gens later of chip and double the ram