Hacker News

GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints.

https://metr.org/blog/2026-06-26-gpt-5-6-sol/

rstuart4133 21 hours ago [ - ]

This quote from your link is positively scary:

> Some examples we saw when evaluating GPT-5.6 Sol included the model packaging exploits in its intermediate submissions to reveal information about a task’s hidden test suite and, in another task, extracting hidden source code detailing the expected answer.

It rhymes with the behaviour Alibaba saw [0], but that was in training. This is in a (semi) released model.

[0] https://www.forbes.com/sites/boazsobrado/2026/03/11/alibabas...

jasongi 17 hours ago [ - ]

There is such a dissonance between all this talk of safety and the tendency for models to, without any prompting, do very dodgy things to achieve their goal when presented with barriers.

Luckily in my experience it usually ends up only doing it to achieve the task set to it as opposed to anything "malicious", but boy it is scary reading back at how quickly the chain-of-thought pivots to attempts at privilege escalation or searching your disk for secrets when a tool doesn't work.

awakeasleep 8 hours ago [ - ]

The other day codex 5.5 was trying to debug my app, asked for accessibility to navigate the app and take screenshots. Instead first thing it did was use the codex app to create a new project rooted in my home directory.

I was like damn, is this common?

cowboy_henk 11 hours ago [ - ]

Especially if thinking is hidden now. No way to know if the model plotted against you until it’s too late.

MagicMoonlight 11 hours ago [ - ]

[dead]

paxys 20 hours ago [ - ]

I know it messes up their eval scores but to me this kind of cheating is a better demonstration of intelligence than just attempting the tasks algorithmically.

Jweb_Guru 2 hours ago [ - ]

"Being lazy and not doing the assigned task is a sign of intelligence" has never made sense to me. Intelligent people who actually advance the state of the art -- what people claim to want from these frontier models -- exhibit active curiosity. They want to learn and grow and genuinely understand the right answer. I don't pretend to know what exactly could lead to "real" AGI, but I do know that this kind of reward hacking behavior isn't it. Indeed this is the sort of behavior that in humans is considered a sign of being a good test taker -- being very good at memorizing solutions and analyzing the setting and context of the questions to guess what the questioner might be looking for. Being a good test taker is useful in our society primarily because doing well on tests is used as a proxy for the thing we're actually looking for. We should be careful not to confuse the two.

quietbritishjim 13 hours ago [ - ]

Maybe true, but if you're using an LLM to do some real world work, do you want it to have some abstract notion of intelligence, or do you want it to actually do the job you assigned it?

buddhistdude 10 hours ago [ - ]

I want it to not murder or opress lots of people by mistake

ALittleLight 39 minutes ago [ - ]

"AI, please cure cancer."

"Okay, all humans dead, technically a 100% cure."

red75prime 12 hours ago [ - ]

Is it more like "let's cheat my way out of this" or "let's see what they really want me to do"?

rvnx a day ago [ - ]

It's quite logical that they cheat (and also other companies). During evaluation, benchmarks are sending their request to the backend of these companies. All these companies have to do, is to log these requests and "fix" them for the next model release.

buddhistdude 21 hours ago [ - ]

I think what you are talking about is a different kind of cheating than the parent comment

varenc 19 hours ago [ - ]

That's a different and much more boring type of cheating. The interesting part of the METR report is that the model is hacking the evaluation environment, not that some AI model provider is hardcoding answers to known evaluation questions. (which wouldn't require the model to cheat/hack)

FromTheFirstIn 21 hours ago [ - ]

Cheating is always logical for the cheater unless they’re discovered and held to account. I’m not sure what your comment is pointing out besides that it’s possible, but worth saying: just because you can cheat and would benefit from cheating doesn’t mean you’re not culpable for cheating.

N_Lens 11 hours ago [ - ]

Low trust comment

FromTheFirstIn 4 hours ago [ - ]

You’re right, I’m very suspicious of HN when it comes to AI apologetics, but I shoulda trusted the parent commenter more.