Right, but when humans are writing the code, they have learned to focus on putting downward pressure on the complexity of the system to help mitigate this effect. I don't get the sense that agents have gotten there yet.

Big business LLMs even have the opposite incentive, to churn as many tokens as possible.

At least tokens are equivalent to measuring 'thinking'... I wouldn't mind if it burned 100k tokens to output a one line change to fix a bug.

The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.