Hacker News

I'm not a luddite.

And I don't think that my argument was as narrow as you make it out to be.

It's not required to exactly reproduce training material for an AI to output something that wouldn't stand a "fair use" trial.

"Summarize XY, but prefer different words" is already enough for a blog post. And the possibility to do that is not limited to inference-time input.

And regarding training: you seem to anthropomorphize LLMs in a weird way.

LLMs can only generate content that is entirely derived from their training data.

That the derivation is close to a blackbox for humans does not elevate machines to humans.

The burden of proof about training materials is IMO with LLM companies, not with human creators.

Because companies know full-well that anything that's not an obvious exact reproduction will require humans starting lawsuits in order to claim a copyright violation.

You say:

> - We already have remedies for those publishing or selling copyrighted material already

And I say, with regard to AI, you seem to be intentionally misinterpreting my comment.