I built a platform that allows you to more easily experiment with different system prompts in production. You can record your own metrics and it will automatically tie this information to whatever experiment treatment the user is in. For example, being able to test two system prompts and see which one actually improves user success rates or engagement. This might be useful in something like a sales or customer support agent.
You can update these experiments and prompts within the UI so you don't have to wait for your next deployment.
It's still pretty early but would love any feedback!