>a bunch of internet pages containing things that are blatantly wrong

So Reddit?

I’d imagine the AI companies have all the “pre AI internet” data they scraped very carefully catalogued.