Seeing all these 'coding' benchmarks reminds me that people still don't understand what coding means in practice. People still think one-phase puzzle-solving is coding. Real coding almost always has multiple phases which build on top of one another. There is an architectural component which is missed here - and the sheer number of phases/layers is actually where most of the complexity comes from.

Usually what I need a LLM to do is find me a elegant agorithm for a problem I've encountered where I know there's an elegant algorithm but I've got no idea what it's called or how to google search for it.

It makes sense but if the goal is to replace software engineers as claimed, then these benchmarks aren't going to achieve that.

Companies are still stuck in this mindset conflating software engineering with puzzle-solving. This is evident from their job interviews and also these LLM benchmarks.