> I don't know of a good solution for the second though.
I know the second issue can be a legitimate problem but I feel like the first issue is the primary problem here & the "solution" to the second issue is a remedy that's worse than the disease.
The public suffix list is a great system (despite getting serious backlash here in HN comments, mainly from people who have jumped to wildly exaggerated conclusions about what it is). Beyond that though, flagging domains for phishing for having duplicate content smells like an anti-self-host policy: sure there's phishers making clone sites, but the vast majority of sites flagged are going to be legit unless you employ a more targeted heuristic, but doing so isn't incentivised by Google's (or most company's) business model.