IMO ASN-based blocking should be much more common, but unfortunately it is not supported as a first-class configuration option in many common tools.
IMO ASN-based blocking should be much more common, but unfortunately it is not supported as a first-class configuration option in many common tools.
Yeah, I dont know how anybody stays sane without it. I have a list of over a thousand ASNs I blackhole at this point...
Mine is a daily bash cronjob that fetches a text-based database and uses grep to build an nftables-apply script with all the IPs for the blocked ASNs. I keep meaning to share it, but it's embarrassingly messy I haven't had time to clean it up...
It's been a real game of cat and mouse over the last few years. I used to do daily iptables updates to block repeat scrapers on my small niche stats site I run. About 5-6 ago it become more common to see broader ranges - so I started blocking ASNs which worked great (esp for the regulars like Alibaba, Tencent, compromised DigitalOcean/OVH, ...). In the last 2-3 years though the overall bot traffic has skyrocketed - it's easy to spot bot activity after the fact (no requests to the CDN for static assets, user agent changes from one request to the next, predictable ID enumeration, etc) but not in a real time. They're also often using residential-based proxies and Cloudflare bot detection has become pretty bad.
Arms races suck. I've managed to find a few L7 tricks to catch the residential proxies and serve them an empty 200, but there are obvious trivial workarounds on the other end and if I start talking about them in public they won't last long... I wish I could share :/
Cloudflare is so easy to defeat and almost everyone in the scrapping industry is selling solutions that automatically bypass, hcaptcha solving is also really cheap nowadays.
It would still be useful to share as an example and reference point. People can use Claude Code / etc. to re-write it to their specific situation.
It would break the internet to make this available to the average person. A large swath would actively choose to block stuff like: all of Meta, Alphabet, Apple, Amazon, etc etc etc.
Anyhoo, now you mention it this is the tack I am going to take in my own network, thanks!
Nah, they'd just pay botnet operators a few thousand bucks for proxy services.
It's a real pain in the ass because in the absence of ASN based blocking, you often have to give something a long list of IP ranges in CIDR notation, and be certain you don't "miss" even one ipv4 /23 or /24 or a crawler will get through.
[dead]