A few comments: - It has long been known in other settings that a small number of points can impact performance of different conventions, this could perhaps be considered a validation of relevance towards the largest scales - I wonder if the reverse could be considered true, if such a small scale of data included in a training corpus can impact the model performance in a negative direction, could that same amount of data impact a model in the positive direction? - I think this is suggestive that there remains benefit to more authoritative sources of data aggregators, like respected publishers, journals, libraries, whereby inclusion of data in such more respected repositories can be considered validation of reliability for training.