Hacker News

Eridrus 5 hours ago [ - ]

Nobody releases numbers that show them to be worse than competitors lol.

This even applies to OpenAI & Anthropic who don't even eval on the same datasets a lot of the time.