They are but from our evals for example GLM 5.2 (unquantized) performs as well as Opus but uses more tokens and takes more time.
I really wish this would change soon but they are not there yet.
They are but from our evals for example GLM 5.2 (unquantized) performs as well as Opus but uses more tokens and takes more time.
I really wish this would change soon but they are not there yet.
Using even double the total tokens and taking, what, 2-3x the time?, still seems worth it if prices are 5x+ cheaper (which OpenRouter [1] claims is the case).
On NeuralWatt for my personal projects at home (not affiliated, just a happy customer), I get so much more mileage out of GLM than I get out of Claude at work, specifically because it's priced as a hammer I can pound any nail-shaped-object with, not a delicacy I need to carefully budget-analyze to try to figure out if it's worth burning my monthly spend limits on this task.
https://openrouter.ai/compare/z-ai/glm-5.2/anthropic/claude-...
I thought true token use was being hidden by anthropic and openai both
No, they do specify token counts, as they let you pay for them. They just don't tell you what these thinking tokens actually are.
Though because they don't show you, they could be lying about it. Very unlikely, I think, would be too dangerous IMO. But technically possible