IP attorney here and actively working on this problem.
nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.
Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.
Anthropic didn't lose because they scraped (read) copyrighted works. They lost because they distributed copyrighted works directly via torrents. Those aren't the same.
I'll bite. I have always been told copyright is inherit. Does it cost money to file a copyright? Do I need to do it for each blog post? For each gist? I'll totally setup some scripts to make it happen if it what actually needs doing to have the copyright I expected.
Edit: remember not to down vote ideas you disagree with. I think it was only down vote things that lower the discourse
I think it depends on the country. In Germany, everything you write is automatically copyrighted, unless you explicitly waive it. In the US, it's the other way around, you have to explicitly state that you want copyright (can somebody confirm this?).
I'm not a lawyer, but I guess a German posting on Hacker News effectively waives their copyright by sending their comment to the US, where an US company then publishes the comment on a US server.
You do have inherent copyright whenever you post, but it puts the burden on you to prove damages (or how much financial harm you suffered from one LLMs piracy alone). Filing fees are $65 for online registration and they allow you to claim atty fees and statutory damages. Statutory damages can range between $700-$150k USD per LLM because you registered it.
So yes, set up some scripts, you can go back 90 days from when you file (you get a grace period). Also if you're publishing frequently to a blog, repo, or newsletter, you can save cost by filing each article under a group registration. Ping me if you need help.
Doesn't the mere act of publishing your original content online grant you copyright?
Statutory damages require registration.
Wait what do you mean by "file a copyright"? I have never heard of this, all explanations of copyright I have heard say that you automatically own the copyright to the things you make; and that "all rights are reserved" by default unless you give up on them through granting a license. Is this no longer the case? Why is this now suddenly different? When did it change?
I hear this a lot! What's suddenly different for the web is the volume of scraping. And that fact that the sum of that scraping is building companies with trillion dollar valuations.
There are tens of millions of registered copyrights in the US, nearly every published book, music, artwork, many magazines and major websites. Here's the official link, you can search the registry and there is a ton of info: https://www.copyright.gov/registration/
Briefly, there is default copyright and registered copyright. Registering works grants stronger protections (i.e. bigger fines if broken).
No one will ever do this, or definitely not enough people will, so what's Plan B?
Bigger portion of the payout for those that do?
The only thing worst than a mega corp is an ip attorney.
Your cause is already lost.
Good luck enforcing whatever frivolous lawsuits you have cooking up against open weights Chinese models that anyone with newer graphics card can crank out inference on.
[dead]