98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.
98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.
We are all going to be graded by (tickets closed / tokens burned) soon enough.
Sweet. I can get that up to infinity, assuming they're using IEEE-754 division.
I doubt it, the difference between someone slightly inefficient and someone extremely efficient isn't big enough to matter compared to how much they cost in salary.