It seems relevant for playing with LLMs, but for actual work this seems far off for me.

My productivity profits from the best intelligence available, a decent context size, and a batch size of four.

While my MacBook has 48 GB of RAM, not only do I want the above requirements at a decent speed, but I also need my machine to run the development tools and test suites, ideally without the fans blasting at full load.

For the foreseeable future I will stay with providers rather than local inference, apart from niche use cases.

Yeah, agree, but that's the point, really. If I could buy a 16Tb machine with 4 TPUs for ~$5K and run a frontier model locally, I would.

I'm in Australia, so we're probably not getting access to Fable again. We're learning that a faster model + better harness/framework > smarter model. So being able to run GLM5.2 locally and super-fast would be great.