Data provinence would be neat and a big benefit. But any solution that requires virtually all content publishers to change approach (here: add signing steps to their publishing workflow) is doomed to fail. There is no alternative way to do this than what OP is doing, which is to try to filter the fire hose of content into real vs not.