Funny, we are working to implement this same logic in our in-house financial categorization agent. When we have a repeat prompt it goes to a json that stores answers and only goes to AI for edge cases.

It’s a good idea

Awesome to hear you’ve done similar. JSON artifacts from runs seem to be a common approach for building this in house, similar to what we did with the muscle mem. Detecting cache misses is a bit hard without seeing what the model sees, part of what inspired this proxy direction.

Thanks for the nice words!