> Senior engineers build features without over-specified requirements
To me this already disqualifies the benchmark. That statement is missing the most critical piece about senior engineers: the senior engineers know how to obtain input for their work on their own whether that talking to customers or using metrics. Never ever they come up with stuff on their own - that’s junior behaviour.
Until a coding agent will be able to *gather* the input on its own, its never going to be „senior”
I'd take this a step further, but that step also curls back to the other side a small bit.
The real skill is being able to both pull the necessary information from these sources as well as being able to intuit gaps in that knowledge based on their understanding of the business and their domain expertise & wisdom. Sometimes you can't get a perfect picture, sometimes the people who should know aren't able to tell you what they really need. You still need to do the right thing.
A benchmark like this can potentially do the second part. But I don't think any model would be good at it, for now.
[flagged]