Hacker News

rvnx a day ago [ - ]

It's quite logical that they cheat (and also other companies). During evaluation, benchmarks are sending their request to the backend of these companies. All these companies have to do, is to log these requests and "fix" them for the next model release.

buddhistdude 21 hours ago [ - ]

I think what you are talking about is a different kind of cheating than the parent comment

varenc 19 hours ago [ - ]

That's a different and much more boring type of cheating. The interesting part of the METR report is that the model is hacking the evaluation environment, not that some AI model provider is hardcoding answers to known evaluation questions. (which wouldn't require the model to cheat/hack)

FromTheFirstIn 21 hours ago [ - ]

Cheating is always logical for the cheater unless they’re discovered and held to account. I’m not sure what your comment is pointing out besides that it’s possible, but worth saying: just because you can cheat and would benefit from cheating doesn’t mean you’re not culpable for cheating.

N_Lens 11 hours ago [ - ]

Low trust comment

FromTheFirstIn 4 hours ago [ - ]

You’re right, I’m very suspicious of HN when it comes to AI apologetics, but I shoulda trusted the parent commenter more.