Not true. Big models buy you baked in knowledge and long context cohesion. A model can be trained to use search and knowledge base tools more efficiently to mitigate the former, and harnesses/workflows can be designed to push models into small parallel threads to mitigate the latter.
The thing that big models will always bring to the table is the ability to YOLO weak/under-specified prompts, and spend less time in the loop making sure work gets partitioned correctly. For smaller/simpler tasks the P(success) difference isn't that big.
Knowledge-base access is not very useful in general because a model doesn't have well-defined "known unknowns" that might trigger an agentic search of the outside knowledge base. Plus surfacing knowledge you don't know much about is itself hard.
These things sound plausible, but have they actually been demonstrated? Wouldn't anyone who succeeded in making such a small but useful LLM be raking in the money now?
Cursor's composer 2.5 is a perfect example. It's right on the heels of the frontier (for coding only) for an order of magnitude cheaper. As much as I've shit on Cursor in the past, I do think the company is well positioned to pick up people getting sticker shock on Anthropic tokens, if they can get their marketing down.
If that's Kimi-based it would very much be on the larger side of open-weight models (1T params).
It is, but the US labs have been pushing parameters heavily. There was a pullback from big models after GPT4.5 in particular, but with a shift towards emphasis on post training and the good results Google got with scaling Gemini 3, all the labs started to push scaling again, which is the reason the frontier is getting more expensive. So that 1T isn't as big as it sounds, the American frontier is probably sitting at 3-5T at least.