I recently found out that Claude's latest model, Sonnet 4.6, scores the highest in Bullsh*tBench[0] (Funny name - I know). It's a recent benchmark that measures whether an LLM refuses nonsense or pushes back on bad choices so Claude has definitely gotten better.

[0] - https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

I haven't tried talking to Sonnet much, but Opus 4.6 is very sycophantic. Not in the sense of explicitly always agreeing with you, but its answers strictly conform to the worldview in your questions and don't go outside it or disagree with it.

It _does_ love to explicitly agree with anything it finds in web search though.

(Anthropic tries to fight this by adding a hidden prompt that makes it disagree with you and tell you to go to bed, which doesn't help.)

Good call on censoring yourself preemptively, otherwise HN could demonetize your comment

You don’t have to star out things like that on HN.

Great link, thanks for sharing. Confirmed what I saw empirically by comparing the different models during daily use.