You didn't quote the interesting part:
> our implementation is it only prunes calls from > 3 user messages ago, if context is > 40K, and only if there's at least 20K tokens to be removed
Seems reasonable to me and explains why I can have long sessions (way longer than with zed agents) while still hitting cache. Opencode is just missing per-provider TTL.
I found that keeping current context utilization at 18% of total context length was best for minimizing spend, across all models with 400k context length or more