These "You're right to push back" scenarios are scary for me. I mostly code ML implementations, and some of the errors Claude Code (CC - have only used Opus 4.7) makes are very sneaky, and if you don't have sufficient experience in the area (I see this with people entering ML and writing their implementations with CC), you wouldn't know when to question CC and will let errors or future pitfalls silently slip into your code. A recent example was when there was data leakage in a model calibration step, which it refused to see as an error, till I wrote a detailed reason, and then it agreed that there was a "subtle leakage".
The leakage problem is so pervasive. None of the frontier models seem to have any idea how to actually hold out rows. God help you if you decide to change the data mix.
I was working on creating a next-n-actions predictor for one of our use cases and not paying much attention for a PoC. I was fairly happy with the progress for a few days, before actually reading the eval code and seeing that we leaked the final state in every eval.
It's nice to let claude run loose on porting from framework to framework (port my code from TRL to NemoRL to Tinker to VeRL) but looking at what it does in the intermediate steps makes me want to claw my eyes out. And getting it to adhere to our domain model (e.g. we have an SFTConfig and a .to_trl(), or a Row and a .to_harmony()) is impossible.
Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.
Most of the time my pushbacks are true improvements, but I've seen a couple of instances where the LLM was happy to downgrade their own good solution.
I've had those as well. Sometimes I'm asking clarifying questions because I'm not sure about the solution, and the LLM "interprets" that as pushback (as opposed to curiosity / enquiry), and sycophancy takes over. Sometimes it will simply change the code without ever answering the questions, or it will answer the questions along with it, but incorrectly - or with bad assumptions.
> Answer grounded in truth, with evidence and concrete proof, no guessing or assumptions allowed, no changes to files on disk.
I've used this a bunch as a suffix to try to prevent that, works OK in most cases, but not always obviously, works better in the system/developer prompt if you have access to those. Seems I've used that about ~1000 times since 2025/08 when I started using codex (- transcription duplications, so maybe 1/2 of that?).
> Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.
Indeed, it's easy to surface this by sending one model a "Review" of their proposal to another, then bounce them back and forward, ask which one is best and both models will almost always say something like "The other proposal/review is better", I'm guessing because somehow they think it comes from the human, and "human is always right" or something.
What's mind-blowing to me is that people see the "you're right to push back" as anything besides hallucination / self affirmation
Dude, the fucking model is great for sure, but there is nothing behind the illusion. It doesn't know if something is right or wrong - simpler or harder to reason about etc
It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking
Why is that so hard for people to grok?
Our industry (and society after) is beyond doomed with people seeing these self affirmations as anything like "insightful" validation.
Looking on the bright side, where there's AI-generated muck, there will be brass for humans willing to clean it up.
How does it correct itself then? I often will push back without giving it the way out and it often does find it
If you're fantasy was real, then how can you also have it correct itself from a passable solution to a dumbsterfire?
That fundamentally wouldn't happen if it wasn't just an illusion.
There is value in it for sure and I can use it to write a lot of simple code, which is 99.99% of enterprise software - but that's another topic.
The coding aspect is a great example of why I am skeptical of the claim they cant reason (in its own way).
Something that can write a correct code snippet or even larger program that accepts the correct input and provides the correct output and otherwise is consistent with the given spec is doing something substantially more than just autocomplete.
I did say
> It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking
So yeah, I do agree that they can make a very reasonable amount of reasoning. As a matter of fact, they reason about things better then an average Joe off the street ime.
That's entirely unrelated to what I said though, I think you misinterpreted/misunderstood what I wrote earlier.
They can make solid attempts at reasoning, its just not grounded in reality. It just applies these rhetoric processes to the current text - but it doesn't understand wherever it's actually correctly reasoned. Hence the answer "you're right to push back on this" is just the model being a sycophant. The sentence does not mean that anything of value has been communicated in either direction, and thinking that it has means the person in question is suffering from ai psychosis
Are you questioning how LLMs work? It's not a mystery up for debate, it's an open, well known system, you can go learn it for yourself and see.
By generating plausible-sounding corrections.
We'll see.