How can you have cache hit efficiency? Isn't it just a matter of not changing the previous context? I don't understand what knobs there are to tweak on this.
How can you have cache hit efficiency? Isn't it just a matter of not changing the previous context? I don't understand what knobs there are to tweak on this.
> Isn't it just a matter of not changing the previous context?
Yes, but a lot of harnesses change previous context. E.g. the system prompt injects the current time/date, working directory, files in the working directory, etc. Compaction also changes the whole previous context. I _think_ changing the list of tools also invalidates cache, so invoking a subagent with different tools would invalidate the cache.
My vague impression is that it's in a similar vein to functional programming languages. It generally disallows doing things that lead to bugs (cache misses in this case), and presumably allows you to do those things in a way that makes it much clearer that this is likely to cause cache misses. I would guess that in this paradigm, you don't mutate your existing session, you derive a new session by mutating the prior context into a new context.
changing between plan/build mode in some agents will change the tools list, which breaks the cache.
Cache is always there, it’s just that it only caches up to the point where an input token changes. So if the tools list is early in the prompt, changing it would limit cache for most of the prompt. If the tools list is the last thing, you could still get 99% cache hits even if it changes every turn.
After a couple of turns the system prompt is a small part of the context. Not changing the system prompt at all is key so that the rest of the history is itself part of the prefix.