Hacker News

m101 15 hours ago [ - ]

So if we assume this is the future, the useful life of many semiconductors will fall substantially. What part of the semiconductor supply chain would have pricing power in a world of producing many more different designs?

Perhaps mask manufacturers?

ivan_gammel 14 hours ago [ - ]

It might be not that bad. “Good enough” open-weight models are almost there, the focus may shift to agentic workflows and effective prompting. The lifecycle of a model chip will be comparable to smartphones, getting longer and longer, with orchestration software being responsible for faster innovation cycles.

ACCount37 7 hours ago [ - ]

"Good enough" open weights models were "almost there" since 2022.

I distrust the notion. The bar of "good enough" seems to be bolted to "like today's frontier models", and frontier model performance only ever goes up.

ivan_gammel 6 hours ago [ - ]

The generation of frontier models from H1 2025 is the good enough benchmark.

ACCount37 6 hours ago [ - ]

Flash forward one year and it'll be H1 2026.

ivan_gammel 4 hours ago [ - ]

I don’t see why. Today frontier models are already 2 generations ahead of good enough. For many users they did not offer substantial improvement, sometimes things got even worse. What is going to happen within 1 year that will make users desire something beyond already working solution? LLMs are reaching maturity faster than smartphones, which now are good enough to stay on the same model for at least 5-6 years.

m101 14 hours ago [ - ]

If you’re running at 17k tokens / s what is the point of multiple agents?

ivan_gammel 13 hours ago [ - ]

Different skills and context. Llama 3.1 8B has just 128k context length, so packing everything in it may be not a great idea. You may want one agent analyzing the requirements and designing architecture, one writing tests, another one writing implementation and the third one doing code review. With LLMs it’s also matters not just what you have in context, but also what is absent, so that model will not overthink it.

EDIT: just in case, I define agent as inference unit with specific preloaded context, in this case, at this speed they don’t have to be async - they may run in sequence in multiple iterations.