Using prompt as source of truth is not reliable, because outputs for the same prompt may vary

Of course. But if this is the direction we’re going in, we need to try and do something no? Did you read the post I linked?