Oh, ouch, yeah. We already know that misinformation tends to get amplified, the last thing we need is a starting point full of harmful misinformation. There are lots of "causal beliefs" on the internet that should have no place in any kind of general dataset.

It's even worse than that, because the way they extract the causal link is just a regex, so

"vaccines > autism"

because

"Even though the article was fraudulent and was retracted, 1 in 4 parents still believe vaccines can cause autism."

I think this could be solved much better by using even a modestly powerful LLM to do the causal extraction... The website claims "an estimated extraction precision of 83% " but I doubt this is an even remotely sensible estimate.