It’s incredible how fast these models are getting better. I thought for sure a wall would be hit, but these numbers smashes previous benchmarks. Anyone have any idea what the big unlock that people are finding now?

Companies are optimizing for all the big benchmarks. This is why there is so little correlation between benchmark performance and real world performance now.

Isn’t there? I mean, Claude code has been my biggest usecase and it basically one shots everything now

Yes, LLMs have become extremely good at coding (not software engineer though). But try using them for anything original that cannot be adapted from GitHub and Stack Overflow. I haven't seen much improvement at all at such tasks.

Strongly disagree with this. And I'm going to provide as much evidence as you did.