They absolutely are segregated

With OpenAI at least you can specify the cache key and they even have this in the docs:

Use the prompt_cache_key parameter consistently across requests that share common prefixes. Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

> Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.

Why below a certain number? Usually in caches a high number of requests keeps the cached bit from expiring or being replaced, no?

Does anyone actually compute / use this key feature? Or do you rely on implicit caching? I wish HN had a comment with a poll feature.

It would be important to use for relatively high traffic use cases

Let's say you have a chatbot with hundreds of active users, their requests could get routed to different machines which would mean the implicit caching wouldn't work

If you set the cache key to a user id then it would be more likely each user's chat could get cached on subsequent requests