Hacker News

That's Grok 4.2 not 4.3 right?

And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5)

Good catch, there was an issue with the second hardest thing in programming (caching).

Here's an updated eval with the proper models https://a3bmfqfom3.evvl.io/