The trick to do this well is to split the part of the prompt that might change and won't change. So if you are providing context like code, first have it read all of that, then (new message) give it instructions. This way that is written to the cache and you can reuse it even if you're editing your core prompt.
If you make this one message, it's a cache miss / write every time you edit.
You can edit 10 times for the price of one this way. (Due to cache pricing)
Is Claude caching by whole message only? Pretty sure OpenAI caches up to the first differing character.
Interesting. Claude places breakpoints. Afaik - no way to do mid message.
I believe (but not positive) there are 4 breakpoints.
1. End of tool definitions
2. End of system prompt
3. End of messages thread
4. (Least sure) 50% of the way through messages thread?
This is how I've seen it done in open source things / seems optimal based on constraints of anthropic API (max 4 breakpoints)