Hacker News

Symbiote 6 days ago [ - ]

Google (and the others) crawl from a published IP range, with "Google" in the user agent. They read robots.txt. They are very easy to block

The AI scum companies crawl from infected botnet IPs, with the user agent the same as the latest Chrome or Safari.

davepeck 6 days ago [ - ]

Okay. Which, specifically, are the "AI scum" companies you're speaking of?

There are plenty of non-AI companies that also use dubiously sourced IPs and hide behind fake User-Agents.

Symbiote 5 days ago [ - ]

I don't know which companies, of course. They hide their identity by using a botnet.

This traffic is new, and started around when many AI startups started.

I see traffic from new search engines and other crawlers, but it generally respects robots.txt and identifies itself, or else comes from a small pool of IP addresses.

lostmsu 5 days ago [ - ]

Why do you think the bots you see are AI scum companies?