Hacker News

The main bus is 300gb/sec, which is on par with MB Pro. MB Max has the 600gb/sec of unified memory (about ~500 or so in practice for token generation) only for the 40 core variant, which is like $7k +, which is ironically more expensive than a dual 3090 card desktop. The 32 core variant which is still wildly expensive is like ~400 gb/sec.

The biggest thing where this will crush Apple is the initial prefill phase. 6000+ cores vs 32/40, + active cooling with fans. For local llm models, this matters quite a bit more than tokens/second.

In the end, neither are really worth it for llm use compared to just building a desktop and just port forwarding over ssh to ollama.