Hacker News

czhu12 9 hours ago [ - ]

It’s incredible how fast these models are getting better. I thought for sure a wall would be hit, but these numbers smashes previous benchmarks. Anyone have any idea what the big unlock that people are finding now?

fsh 9 hours ago [ - ]

Companies are optimizing for all the big benchmarks. This is why there is so little correlation between benchmark performance and real world performance now.

czhu12 8 hours ago [ - ]

Isn’t there? I mean, Claude code has been my biggest usecase and it basically one shots everything now

fsh 8 hours ago [ - ]

Yes, LLMs have become extremely good at coding (not software engineer though). But try using them for anything original that cannot be adapted from GitHub and Stack Overflow. I haven't seen much improvement at all at such tasks.

dboreham 5 hours ago [ - ]

Strongly disagree with this. And I'm going to provide as much evidence as you did.