I used a paid model to try this. Same deal.

I think the real misleading thing is marketing propping up paid models being somehow infinitely better when most of the time it's the same exact shit.

I copied/pasted a comment with faulty logic (self-defeating) directly from a HN comment and asked a bunch of models available to me (Gemini and Claude) if it could spot the issue. I figured it would be a nice test of reasoning since an actual human missed it. The only one that found the logic error without help was Claude 4.6 Opus Extending Thinking. The others at best raised relevant counterpoints in the supporting argument but couldn't identify the central issue. Claude's answer seemed miles ahead. I wonder if SotA advancements will continue to distinguish themselves.

Care to share the comment in question with the rest of us so we can check for ourselves? :-)

And midwits here saying "yeah bro they have some MUCH better model internally that they just don't release to the public", imagine being that dense. Those people probably went all in on NFTs too and told other "you just don't get it bro"