Frankly, I don't understand why someone would even try to crawl Hacker News.
There is an official dump which doesn't even require parsing HTML at all: https://console.cloud.google.com/marketplace/details/y-combi...
Frankly, I don't understand why someone would even try to crawl Hacker News.
There is an official dump which doesn't even require parsing HTML at all: https://console.cloud.google.com/marketplace/details/y-combi...
These are not, er, experienced crawlers.
https://www.youtube.com/watch?v=Sbpl3ywNlpA#t=56s