The tons of python code would be great training data if there was any consistency across the ecosystem. Yet every project I've touched required me to learn it's unique style. Then I'd imagine they practically poisoned half the training set because python2 is subtly different.