The article mentions using this as a means of detecting bots, not as a complaint that it's abusive.
EDIT: I was chastised, here's the original text of my comment: Did you read the article or just the title? They aren't claiming it's abusive. They're saying it's a viable signal to detect and ban bots.
Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".[1]
Yeah but the abusive behavior is ignoring robots.txt and scraping to train AI. Following commented URLs was not the crime, just evidence inadvertently left behind.
> The robots.txt for the site in question forbids all crawlers, so they were either failing to check the policies expressed in that file, or ignoring them if they had.
The article mentions using this as a means of detecting bots, not as a complaint that it's abusive.
EDIT: I was chastised, here's the original text of my comment: Did you read the article or just the title? They aren't claiming it's abusive. They're saying it's a viable signal to detect and ban bots.
They call the scrapers "malicious", so they are definitely complaining about them.
> A few of these came from user-agents that were obviously malicious:
(I love the idea that they consider any python or go request to be a malicious scraper...)
Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".[1]
[1] https://news.ycombinator.com/newsguidelines.html
the first few words of the article are:
> Last Sunday I discovered some abusive bot behaviour [...]
Yeah but the abusive behavior is ignoring robots.txt and scraping to train AI. Following commented URLs was not the crime, just evidence inadvertently left behind.
> The robots.txt for the site in question forbids all crawlers, so they were either failing to check the policies expressed in that file, or ignoring them if they had.
Crawlers ignoring robots.txt is abusive. That they then start scanning all docs for commented urls just adds to the pile of scummy behaviour.
Human behavior is interesting - me, me, me…