Companies are optimizing for all the big benchmarks. This is why there is so little correlation between benchmark performance and real world performance now.

Isn’t there? I mean, Claude code has been my biggest usecase and it basically one shots everything now

Yes, LLMs have become extremely good at coding (not software engineer though). But try using them for anything original that cannot be adapted from GitHub and Stack Overflow. I haven't seen much improvement at all at such tasks.

Strongly disagree with this. And I'm going to provide as much evidence as you did.