In my tests[0] GLM-5.2 is not much better than GLM-5, and overall DeepSeek V4 Flash seems to be the better/more cost-effective choice:

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

I think the problem is, as can also be seen on other benchmarks, is that most models nowadays are focused more and more purely on tool calling and coding.

This means, that models are losing more and more general and domain-specific knowledge.

Look at those graphs on ARtificialAnalysis, GLM-5.1 still performs similarly or better:

AA-Omnisicence Accuracy: https://i.snipboard.io/5DYmpx.jpg

IFBench: https://i.snipboard.io/74kg0R.jpg

I still feel like models are not getting any smarter for a few months already, they just changed their training to be focused more on some areas than others, so shifting the intelligence from one place to another, not necessarily increasing the overall intelligence or "AGI" score.

man, i love dsv4-flash but i found its weaknesses in complex projects with multiple moving parts. tried kimi 2.6 and it understood and could work on the task. bigger is better..