Hacker News

nayroclade 3 days ago [ - ]

The models they tested are already way behind the current state-of-the-art. Would be interesting to see if their results hold up when repeated with the latest frontier models.

StilesCrisis 3 days ago [ - ]

I think we have all seen the latest models turn into a hot mess.

louiereederson 3 days ago [ - ]

i interpret figure 2 as showing that incoherence increases with model gens, albeit on a small sample size