Hacker News

solumunus 15 hours ago [ - ]

You’re completely overrating these benchmarks and it’s landing you at a nonsense opinion. Just actually use the models and you will see that the gap is significant.

irthomasthomas 11 hours ago [ - ]

It should be easy for a company like Anthropic to prove this beyond a doubt. Why don't they? Why don't they have a collection of prompts and side-by-side comparisons with other models showing how far ahead they are?

largbae 10 hours ago [ - ]

I think it's mainly because the difference in models at the frontier isn't "response to prompt X", but rather "coherence with 500K tokens of context and instructions in play"

viking123 4 hours ago [ - ]

Good morning to the Anthropic office good sir