My guess always was that - if you took the source of training data- meaning the authors of the "best" answers and solutions on stackoverflow or github- and got the question reformatted, to sound like it was created by these experts- the created code, would try to hug these sources of truth while getting created.

So, the challenge is actually to find a map of "problem" to "author" and then from "author" to "related code" and from their to a solution.