Hacker News

This is because whoever owns Fivethirtyeight now (ABC?) deleted the whole archive of articles on the site.

Don't we need more than an index of Archive.org because whomever controls the domain could robots.txt these out of existence if they wanted to?

ycombinete an hour ago [ - ]

Archive.org mostly ignores robots.txt

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

zzo38computer 10 minutes ago [ - ]

The robots.txt file should be used to restrict (and, in some cases, slow down) crawling at the time it is being crawled, not for SEO or for restricting access to mirrors or for any other purpose. It should never apply retroactively. (Unfortunately it is sometimes used badly despite this.)

Jiro 20 minutes ago [ - ]

People always use that link as reference to say that Internet Archive ignores robots.txt but it only actually says they are ignoring it for government sites. It suggests that they might do it for other sites in the future (of 2017), but does not actually say that that they have done it.

https://blog.archive.org/2018/04/24/addressing-recent-claims... which is a year later mentions that they have an automated process which is still following robots.txt for displaying old pages where the robots.txt was added later.

https://help.archive.org/help/using-the-wayback-machine/ does say they follow it for scraping, but this is phrased in such a way that would still be true for past sites whether or not they changed the policy. There is a page https://www.sysjolt.com/2021/archive-org-no-longer-honors-ro... which claims they don't follow it, but the site owner misspelled "robots" as "robot".

Avicebron 3 hours ago [ - ]

Bourdieu. The field has structure, the structure has logics, the logics shape what counts as a publishable story, a promotable journalist, a credible source, a "balanced framing".

tantalor 3 hours ago [ - ]

Please, say that again in comprehensible English.

Avicebron 2 hours ago [ - ]

The ownership relationship was always load-bearing? The journalism in this case was a tenant, I highly recommend that people promote forms of independent journalism?

EDIT: dude have you heard of the s in https, http://johntantalo.com gets flagged.