I get 98.6% cache hits on Claude code. Short of drastic arch changes it’s hard to imagine it getting much better.

98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.

We are all going to be graded by (tickets closed / tokens burned) soon enough.

Sweet. I can get that up to infinity, assuming they're using IEEE-754 division.

I doubt it, the difference between someone slightly inefficient and someone extremely efficient isn't big enough to matter compared to how much they cost in salary.

You pay for cache hits on every turn and even with the newest architectures longer context is slower/more energy intensive. Constructing concise turns that reuse prefix and stop when the new context is no longer useful help, as does pushing generation down into cheaper models while using stronger models for verification.