Considering that OpenAI's model got a higher score than any of the world's best collegiate programming teams, I'd guess that a mechanical turk would not do better (even if you gave them quite a bit of time).