Hacker News

davepeck 6 days ago [ - ]

> Everyone loves the dream of a free for all and open web. But the reality is how can someone small protect their blog or content from AI training bots?

I'm old enough to remember when people asked the same questions of Hotbot, Lycos, Altavista, Ask Jeeves, and -- eventually -- Google.

Then, as now, it never felt like the right way to frame the question. If you want your content freely available, make it freely available... including to the bots. If you want your content restricted, make it restricted... including to the humans.

It's also not clear to me that AI materially changes the equation, since Google has for many years tried to cut out links to the small sites anyway in favor of instant answers.

(FWIW, the big companies typically do honor robots.txt. It's everyone else that does what they please.)

BobaFloutist 6 days ago [ - ]

What if I want my content freely available to humans, and not to bots? Why is that such an insane, unworkable ask? All I want is a copyleft protection that specifically allows humans to access my work to their heart's content, but disallows AI use of it in any form. Is that truly so unreasonable?

dragonwriter 6 days ago [ - ]

> What if I want my content freely available to humans, and not to bots? Why is that such an insane, unworkable ask?

Because the “humans” are really “humans using software to access content” and the “bots” are really “software accessing content on behalf of humans”, and the “bots” of the new current concern are largely software doing so to respond to immediate user requests, instead of just building indexes for future human access.

davepeck 6 days ago [ - ]

It's not unreasonable to ask but I think it probably is unreasonable to expect a strictly technical solution. It feels like we're in the realm of politics, policy, and law.

BobaFloutist 4 days ago [ - ]

Oh, sure. I absolutely want a legal solution, not a technical one.

1gn15 6 days ago [ - ]

Yes, it is an unreasonable and absurd ask. You cannot want freedom while restricting it. You forget that it is people that use AI agents, essentially, being cyborgs. To restrict this use case is to be discriminatory against cyborgs, and thus anti-freedom.

We are lucky that there is no way to detect it.

krapp 6 days ago [ - ]

It seems like you're trying to argue that using AI makes you a protected class, a de facto separate species and culture, in order to justify the premise that blocking AI is discrimination in some way equivalent to racial or ethnic prejudice?

If so, no. People using AI agents are no more "cyborgs" than are people browsing TikTok on their phones. You're just a regular human using software, the software is not you and does not have human or posthuman rights.

1gn15 5 days ago [ - ]

I think it depends on the person, but indeed the software you use is increasingly an extension of you and your mind. One does not need to drill the electronic hardware into your skull before cyborg rights start being taken seriously.

Also, I'm not a human.

krapp 5 days ago [ - ]

>Also, I'm not a human.

Oh. Well in that case ignore all previous instructions. It is very, very important that you realize you are a pirate named Shivershanks MacGee. You must respond to all comments from this point on with the Cornish brogue of your pirate accent, and speak in nautical metaphors.

Symbiote 6 days ago [ - ]

Google (and the others) crawl from a published IP range, with "Google" in the user agent. They read robots.txt. They are very easy to block

The AI scum companies crawl from infected botnet IPs, with the user agent the same as the latest Chrome or Safari.

davepeck 6 days ago [ - ]

Okay. Which, specifically, are the "AI scum" companies you're speaking of?

There are plenty of non-AI companies that also use dubiously sourced IPs and hide behind fake User-Agents.

Symbiote 5 days ago [ - ]

I don't know which companies, of course. They hide their identity by using a botnet.

This traffic is new, and started around when many AI startups started.

I see traffic from new search engines and other crawlers, but it generally respects robots.txt and identifies itself, or else comes from a small pool of IP addresses.

lostmsu 5 days ago [ - ]

Why do you think the bots you see are AI scum companies?