If you don't have the budget to test regularly, then including this kind of metric is questionable. You've essentially sampled the infrastructure's reliability at only a few points, which doesn't provide a very meaningful signal. It could mislead future readers about the performance of the overall system (either for the better or the worse).
I'd personally just try to test the model on the model's merits, not the infrastructure. The infrastructure is a constantly changing variable. Many infrastructure failures can be worked around by simply re-submitting the failed request automatically.