They absolutely are segregated
With OpenAI at least you can specify the cache key and they even have this in the docs:
Use the prompt_cache_key parameter consistently across requests that share common prefixes. Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.
> Select a granularity that keeps each unique prefix-prompt_cache_key combination below 15 requests per minute to avoid cache overflow.
Why below a certain number? Usually in caches a high number of requests keeps the cached bit from expiring or being replaced, no?
Does anyone actually compute / use this key feature? Or do you rely on implicit caching? I wish HN had a comment with a poll feature.
It would be important to use for relatively high traffic use cases
Let's say you have a chatbot with hundreds of active users, their requests could get routed to different machines which would mean the implicit caching wouldn't work
If you set the cache key to a user id then it would be more likely each user's chat could get cached on subsequent requests