robots.txt seems like it should be a legally-binding terms of service which would make them outright copyright infringing.
Sue for $180,000 per infringement which should be calculated for each illegal API call.
robots.txt seems like it should be a legally-binding terms of service which would make them outright copyright infringing.
Sue for $180,000 per infringement which should be calculated for each illegal API call.
Was your robots txt written by a lawyer? Does it hold up in the court?
OpenAI might in fact be a good target for stuff like this at the moment. Even if your argument is weak, they may be eager to settle generously if your suit threatens the speediness of their IPO in some way. But I happen to think this is in fact a reasonable argument: I put up a sign that says not to do something with my property, and you went ahead and did it anyway, costing me money. IANAL but seems like a straightforward tort, no?
Contracts are legally binding even if they weren't written by a lawyer. Copyright is legally binding even if no copyright claim is explicitly stated.
I looked into this a bit (not a lawyer) and it seems that robots.txt isn't legally binding to either party, but this seems to have two major implications for AI agents (and crawlers/scrapers in general).
First, even if the robots.txt says you can crawl the site, that isn't a copyright grant of any kind or permission to copy/use that data outside of the permissions granted by the TOS.
Second, ignoring the robots.txt while also pirating the site contents could point to bad-faith and makes a much stronger case for double-damage penalties due to willful infringement.
If the site TOS doesn't explicitly grant an AI agent rights to copy out the site content AND the AI agent is ignoring the robots.txt at the same time, it seems a lot more likely that there's a strong copyright infringement case against the agent owner.
It doesn't have to be written by a lawyer. The robots.txt file is an administrative directive, by the webmaster of the website, that you, being a scraper, MUST NOT go to page x and/or y, or MUST NOT go to directory z. All the law would have to say is that it is a crime to not obey these directives. It's similar to trespassing: if I put a sign that says "DO NOT ENTER" in bright red letters on a door in my apartment, or "authorized people only!", that is still legally binding and a court isn't going to care that it wasn't lawyer-authored. The court will only care that you were told to not enter that area, but did so anyway.
It doesn't matter. Robots.txt is not a license, it's a set of computer parsable directives of how programs should access your site. The actual license doesn't have to be written for computers to parse to be legally binding.
A person should be able to write in a terms of use or license page on their website that says "do not include any content from this website in your AI training data. if you do you will be billed $100 billion dollars." And it should be enforceable. It just turns out that nerds like to say "oh that would be too hard or too expensive, so we're going to ignore it."