Hacker News

_pdp_ a day ago [ - ]

Frankly it does not matter if there is gap because for most practical use-cases the end user can barely perceive the difference in intelligence.

On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.

nomel 21 hours ago [ - ]

> except for some narrow use-case.

I think it's entirely the opposite. For narrow use cases, like web pages and crud/GUI, the open source models don't show much of a difference.

mft_ 11 hours ago [ - ]

100% agree.

My impression is that the open-weight models have been drawing close-to-level at coding tasks, while Anthropic and OpenAI have been putting large amounts of effort into developing their models' abilities in other domains: legal, biomedical/science, etc. Anthropic (especially?) has also been putting more obvious resource behind optimising their harnesses - from Code to Cowork (which is kinda Code for normies), Design, etc.

_pdp_ 10 hours ago [ - ]

GLM 5.2 has replaced "normie" agentic workflows previously backed by Sonnet and Opus. So I don't know. From my end it seems to me they are perfectly capable of working agenticly.

mft_ 8 hours ago [ - ]

Maybe we have different definitions of 'normie'.

I'm talking about people who aren't in IT, and who are maybe just learning to use LLMs for aspects of their daily work. These people only know of the big three models, at best - they very rarely know of the open-weight models, and would even more rarely (given their model access is likely determined at a corporate level) be able to access them.

_pdp_ 6 hours ago [ - ]

That's my point too.

If you take GLM and call it ChatGPT or Claude Opus is anyone going to notice? If you are not into agentic AI I would argue that the model type makes zero difference for day to day use because GLM 5.2 is hitting the benchmarks hard.

Now for a specialised use case (narrow fields), say cyber, Mythos is possibly better.

taffydavid 10 hours ago [ - ]

You think Web pages, crud and gui are a narrow use case?