"It is built end-to-end by Microsoft using clean and appropriately licensed data."

Well still no list nor publication of the training data.