> I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Well, the hardware remains the same but local models get better and more efficient, so I don't think there is much difference between paying 5k for online models over 5 years vs getting a laptop (and well, you'll need a laptop anyway, so why not just get a good enough one to run local models in the first place?).

Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

Even still, right now is when the first gen of pure LLM focused design chipsets are getting into data centers.

At a certain point, tokens per second stop mattering because the time to review stays constant. Whether it shits out 200 tokens a second versus 20, it doesn't much matter if you need to review the code that does come out.

If you have inference running on this new 128GB RAM Mac, wouldn't you still need another separate machine to do the manual work (like running IDE, browsers, toolchains, builders/bundlers etc.)? I can not imagine you will have any meaningful RAM available after LLM models are running.

No? First of all you can limit how much of the unified RAM goes into VRAM, and second, many applications don't need that much RAM. Even if you put 108 GB to VRAM and 16 to applications, you'll be fine.