Hacker News

>The ideal case would be something that can be run locally, or at least on a modest/inexpensive cluster.

It's obviously valuable, so it should be coming. I expect 2 trends:

- Local GPU/NPU will have a for-LLM version that has 50-100GB VRAM and runs MXFP4 etc.

- Distillation will come for reasoning coding agents, probably one for each tech stack (LAMP, Android app, AWS, etc.)x business domain (gaming, social, finance, etc.)