How? They run their scraping and training infrastructure - and models themselves - from within those “AI datacenters”[1] we hear about in the news - and not proxying through end-users’ own pipes.
[1]: in quotes, because I dislike the term, because it’s immaterial whether or not an ugly block of concrete out in the sticks is housing LLM hardware - or good ol’ fashioned colo racks.
Residential proxy networks.