Why do this though?

It's like if someone was trying to "trap" search crawlers back in the early 2000s.

Seems counterproductive

Because of bots that don't respect ROBOTS.txt .

If you want an AI bot to crawl your website while you pay for that bandwidth then you wont use the tool.

If bandwidth cost is a concern the maybe you should reconsider how you publish your site.

Like, what if you actually post something that gains traction, is it going to bankrupt you or something?

It's not just financial, you're taking up a lot of bandwidth, resources etc.

It's not just some light bump in traffic. It's a headache that shouldn't need to be dealt with if they would respect ROBOTS.txt. Quite simple really.

search crawlers used to bring people TO your site llm boots are used to keep people OUT of your site, because knowledge is indexed and distributed by corporations.

So if your site is dependent on ads, and since the only way for people to see those ads is coming to your site, then yes, you lose.

If your site exists to share information, then the information gets disseminated, whether via LLM or some browser, it doesn't make a difference to me

Those are not the only two options.

Why are you presenting the latter option as if it were mainstream? It's such a small percentage of use cases that it probably isn't even a rounding error.

People who want to disseminate information also want the credit.

I'd still like to know why you are presenting this false dichotomy. What reason do you have for presenting a use case that has fractions of a percentage as if it were a standard use case? What is your motivation behind this?

My only motivation is that it pains me to see smart capable people working on insignificant problems.

Maybe I don't understand the problem as well as I should, and I'm open to hearing what it is you think that I'm missing.

But from my perspective, this is a solution for a non-problem, which in my eyes is a problem itself.

You misunderstand: I am asking what is your motivation for presenting a 0.0001% use case as a 50% use case.

The use case you present is so small it can be ignored as an option, yet you present it as the only other option.

> People who want to disseminate information also want the credit.

This is psychological projection.

> This is psychological projection.

You don't know what that means.

In any case, people who want to disseminate information with credit can do so without standing up a blog (any place that allows posting of comments, such as Reddit, HN, etc).

In the context of this discussion, we're talking about site owners; people who put up a blog.

You don't get attribution for your work if it merely feeds into it's training data

That assumes the AI bots are scraping for training data and not simple retrieval/ RAG (which would likely provide attribution)

Web crawlers didn’t routinely take down public resources or use the scraped info to generate facsimiles that people are still having ethical debates over. Its presence didn’t even register and it was indexing that helped them. It isn’t remotely the same thing.

https://www.libraryjournal.com/story/ai-bots-swarm-library-c...

AI bots must've taken down that link you shared, it won't load :/

And search crawlers/results have been producing snippets that prevent users from clicking to the source for well over a decade.

Edit: it loaded. I don't see how the problem isn't simply solved by an off the shelf solution like cloud flare. In the real world, you wouldn't open up a space/location if you couldn't handle the throughput. Why should online spaces/locations get special treatment?

Why should everyone else pay the price for VC-funded, private companies? They should incur the cost.

This is no different than saying “robbers aren’t causing any problems, you just need to lock your doors, buy and set up sensors on every point of potential ingress, and pay a monthly cost for an alarm system. That’s on you.”