Hacker News

Because OpenAI specifically say that:

https://developers.openai.com/api/docs/guides/prompt-caching...

> When using the in-memory policy, cached prefixes generally remain active for 5 to 10 minutes of inactivity, up to a maximum of one hour. In-memory cached prefixes are only held within volatile GPU memory.

You can opt-in to storing the caches on local disk but it's not the default. I haven't done the calculations for why they do this, but given that disaggregated parallel prefill and RDMA can recompute the KV cache very fast, you'd need a huge amount of bandwidth from disk to beat it (and flash drives wear out!).