Hacker News

pegasus 5 days ago [ - ]

Any test?? It's failing plenty of tests not of intelligence, but of... let's call it not-entirely-dumbness. Like counting letters in words. Frontier models (like Gemini 2.5 pro) are frequently producing answers where one sentence is directly contradicted by another sentence in the same response. Also check out the ARC suite of problems easily solved by most humans but difficult for LLMs.

throwawaymaths 5 days ago [ - ]

yeah but a lot of those failures fail because of underlying architecture issues. this would be like a bee saying "ha ha a human is not intelligent" because a human would fail to perceive uv patterns on plant petals.

pegasus 4 days ago [ - ]

The letter-counting, possibly could be excused on this ground. But not the other instances.