So GLM emits fewer tokens and does fewer tool calls, but still takes over twice as long to complete.

Can someone explain to me where that time usage is coming from if not from the model operation itself?

Are the individual tool calls more complex and take more time to complete? Or is the rate of tok/s lower because the model does more compute per token?

I have noticed that Opus and GPT 5.5 are very good at adjusting their thinking / reasoning intensity depending on the task at hand, something the open weights models are still not as good at.

In addition to that, some of the open weights models like GLM 5.2 or DeepSeek v4 Pro tend to be MUCH slower when generating tokens, which contributes to the perceived slowness. Although I wouldn't call models like GLM 5.2 slow by any means, e.g. it is currently one of the fastest models inside Notion today.

Probably the data center where the model is running more than anything. Another option is if Opus is using anything like a Mixture of Experts approach, in which case the amount of the model loaded in memory at one time could be smaller than GLM.

Could just be infra. I'm betting Anthropic is much better prepared.