Hacker News

20k 2 hours ago [ - ]

Yes, and realistically any code that LLMs produce is a derivative work of its training data. There's going to be a huge disaster licensing wise

I have absolutely no idea how LLMs got through anyone's legal departments, I guess the hope is that if everyone breaks the law enough, it'll just be fine

bcjdjsndon 2 hours ago [ - ]

Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?