Interesting. I wonder if that's related to the phenomenon mentioned in the Opus 4.6 model card[1], where increased reasoning effort leads to 4.6 overthinking and convincing itself of the wrong answer on many questions. It seems to be unique to 4.6; I guess they fried it a bit too much during RL training.

[1] https://www.anthropic.com/claude-opus-4-6-system-card