> Going from GLM-4.7 to something comparable to 4.5 or 5.2 would be an absolutely crazy improvement.
Before you get too excited, GLM-4.7 outperformed Opus 4.5 on some benchmarks too - https://www.cerebras.ai/blog/glm-4-7 See the LiveCodeBench comparison
The benchmarks of the open weights models are always more impressive than the performance. Everyone is competing for attention and market share so the incentives to benchmaxx are out of control.
Sure. My sole point is that calling Opus 4.5 and GPT-5.2 "last generation models" is discounting how good they are. In fact, in my experience, Opus 4.6 isn't much of an improvement over 4.5 for agentic coding.
I'm not immediately discounting Z.ai's claims because they showed with GLM-4.7 that they can do quite a lot with very little. And Kimi K2.5 is genuinely a great model, so it's possible for Chinese open-weight models to compete with proprietary high-end American models.
From a user perspective, I would consider Opus 4.6 somewhat of a regression. You can exhaust your the five hour limit in less than half an hour on, and I used up the weekly limit in just two days. The outputs did not feel significantly better than Opus 4.5 and that only feels smarter than Sonnet by degrees. This is running a single session on a pro plan. I don’t get paid to program, so API cost matter to me. The experience was irritating enough to make me start looking for an alternative, and maybe GLM is the way to go for hobby users.
I think there are two types of people in these conversations:
Those of us who just want to get work done don't care about comparisons to old models, we just want to know what's good right now. Issuing a press release comparing to old models when they had enough time to re-run the benchmarks and update the imagery is a calculated move where they hope readers won't notice.
There's another type of discussion where some just want to talk about how impressive it is that a model came close to some other model. I think that's interesting, too, but less so when the models are so big that I can't run them locally anyway. It's useful for making purchasing decisions for someone trying to keep token costs as low as possible, but for actual coding work I've never found it useful to use anything other than the best available hosted models at the time.
It's high-interest to me because open models are the ultimate backstop. If the SOTA hosted models all suddenly blow up or ban me, open models mitigate the consequence from "catastrophe" to "no more than six to nine months of regression". The idea that I could run a ~GPT-5-class model on my own hardware (given sufficient capex) or cloud hardware under my control is awesome.
For the record, opus 4.6 was released less then a week ago.
That you think corporations are anything close to quick enough to update their communications on public releases like this only shows that you've never worked in corporate
Yeah, I'm sure closed source model vendors are doing everything within their power to dumb down benchmarks, so they can look like underdogs and play a pity game against open weight models.
Let's have a serious discussion. Just because Claude PR department coined the term benchmaxxing, we we should not be using it unless they shell out some serious monetes.
I still enjoy using GLM 4.7 on Cerebras because of the speed you can get there and the frankly crazy amount of tokens they give you. Before that, 4.6 messed up file edits in OpenCode and VSC plugins more frequently, 4.7 is way more dependable but still has some issues with Python indentation and some partial edits sometimes (might also be tooling issue, e.g. using \ vs / as file separators in tool calls too) - but the quality of the output went up nicely!
I hope GLM 5 will also be available on Cerebras, since for the low-medium complexity work that's my go to, with Codex and Claude Code and Gemini CLI being nice for the more complex tasks.