I've been using GLM-5/5.1 for about 6 months and it has been a fairly capable model. I've seen a lot of mixed opinions that tend to align with harness usage so it is worth trying out a couple with a model before writing it off. For example, I'm using crush and have had a good experience while others using CC have had a much more mixed experience. For task complexity, I treat it as I would sonnet with the same care in building out plans/prompts before firing it off and letting it go.
I use intelliJ for much of my development and also set the built in AI tools to use my GLM sub (BYOK) and it has worked out well albeit a bit slow.
Overarll, it's my main model and has been getting better with each release.
Yeah, the harness makes a big difference in my experience. Some of the models don't even work with some harnesses, including some very big ones. And some are clearly distilled to work with specific harnesses.
I'd love to see some numbers though, on models/harness combinations.
https://www.tbench.ai/leaderboard/terminal-bench/2.0