Meanwhile O3 can not even count rocks in a picture. This is a commonly recurring theme -- ChatGPT does really well at few things considered hard by us but fails miserably at things even a child could do.

It's almost as if this is a non-human intelligence, which presents different strengths and weaknesses than human intelligence.

Is that really so surprising, considering the tremendous differences in underlying hardware and training process?

I think one cause of this (and some other issues with LLM use) is that people see it exhibiting one human-level trait, its capability to use language at a human level, and assume that it then comes with other human-level capabilities such as our ability to reason.

Do you not think it is interesting though? If I had asked you three years back which one of counting rocks vs geoguesser would ChatGPT beat humans on would you have answered correctly?