I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.

Data has always been the core of it all, onward to the next abstraction, I suppose.

I think computational thinking, or basically "how do I solve this problem efficiently" training data is more valuable then feeding in answers. I don't know what these AI models training data consist of, but it would be interesting to see a model trained purely on reasoning, methods, those foundational skills (basic programming? or maybe not) and then give it some benchmarks.