How is the quality of model answers to your queries? Are they stable over time?

I am wondering how to measure that anyway.