So if we assume this is the future, the useful life of many semiconductors will fall substantially. What part of the semiconductor supply chain would have pricing power in a world of producing many more different designs?

Perhaps mask manufacturers?

It might be not that bad. “Good enough” open-weight models are almost there, the focus may shift to agentic workflows and effective prompting. The lifecycle of a model chip will be comparable to smartphones, getting longer and longer, with orchestration software being responsible for faster innovation cycles.

"Good enough" open weights models were "almost there" since 2022.

I distrust the notion. The bar of "good enough" seems to be bolted to "like today's frontier models", and frontier model performance only ever goes up.

The generation of frontier models from H1 2025 is the good enough benchmark.

Flash forward one year and it'll be H1 2026.

I don’t see why. Today frontier models are already 2 generations ahead of good enough. For many users they did not offer substantial improvement, sometimes things got even worse. What is going to happen within 1 year that will make users desire something beyond already working solution? LLMs are reaching maturity faster than smartphones, which now are good enough to stay on the same model for at least 5-6 years.

If you’re running at 17k tokens / s what is the point of multiple agents?

Different skills and context. Llama 3.1 8B has just 128k context length, so packing everything in it may be not a great idea. You may want one agent analyzing the requirements and designing architecture, one writing tests, another one writing implementation and the third one doing code review. With LLMs it’s also matters not just what you have in context, but also what is absent, so that model will not overthink it.

EDIT: just in case, I define agent as inference unit with specific preloaded context, in this case, at this speed they don’t have to be async - they may run in sequence in multiple iterations.