Hacker News

It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.

For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.

Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.

And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.