GLM-5.2 has been a step change in how fast i can burn through tokens.

I subscribed to their max plan to try it out. It counted me 700M tokens and drained my weekly quota in under 2 days.

Quota just reset less than 24h ago and i'm already >60% weekly quota usage.

For reference the kind of work i did would have used somewhere between 3% and 5% of Codex max or Claude max.

The model is good, the plan is a scam

Kimi and GLM models have coined a new term: Thinkslop. They run a chain of thought that is up to 10x longer than other models and it seems that through a lookback mechanism they are able to use the CoT to reason about solutions to tasks they couldn't otherwise solve.

The downside is of course that they consume many more tokens off your plan, and also that they are significantly slower. Kimi K2.7 takes about 7x longer to finish the same benchmark tasks as DeepSeek V4 Pro on my router benchmarks (https://role-model.dev/).

So for now I'm happy with just two models: GPT and DeepSeek.

> Kimi and GLM models have coined a new term: Thinkslop. > [...] > So for now I'm happy with just two models: GPT and DeepSeek.

1. DeepSeek V3.2, V4 Flash, V4 Pro, at high or max thinking, ... when recommending a model it should always be a precise model, not just an AI lab

2. DeepSeek V4 Flash at max thinking is the most verbose model (among top models) in the AA benchmarks. See the "Intelligence Index Token Use" chart: [1]

[1]: https://artificialanalysis.ai/models?models=gpt-5-5-high%2Cg...

I said specifically V4 Pro. Flash is not the most verbose, that's more likely to be Kimi.

yeah Kimi K2.7 was doing ok but was painfully slow. The coding plan limits were good though.

I haven't tried deepseek yet, i should check this one out.

After the release of K2.7, the Kimi plan quotas have been reduced by about 80%.

Turning up the thinking (max time spent thinking) lever really changes model performance, even for tiny models. But it's really irritating because it adds a lot of time.

> The model is good, the plan is a scam

If it is needing to generate that many tokens to do the same tasks, then it probably has higher inference costs. So (for you) the model is bad, the plan is the same plan.

I gave it my standard:

"Make a pac-man game in a single html page"

It went off and argued with itself for 20 minutes about how to lay out the map and then timed out.

What kind of tasks have you been using it for?