I probably shouldn't have led with that example because yeah, reproducible (and cheap) builds would be best for security audits. But I wouldn't say it's absolutely meaningless. At least it can guide your experimentation, and if results start differing radically from what you'd expect from the training data, that raises interesting questions.