How is attempting to benchmark llms like religion?

Re-read the comment I'm replying to, it's not talking about benchmarks, just models.