The media industry loves to quote ridiculous numbers on lost revenue due to piracy etc. May be a rough ballpark numbers will get them to do something about this theft.
Can someone put a rough estimate on potential revenue loss (direct and incidental) from training AI with industry wise breakup.
It’s wrong to stop progress. I just want to know what data went into my model and have access to the same data. The same way we have national libraries of books but with the caveat that I don’t really know how one is supposed to browse petabytes of OpenAI .zips like I browse old books.
If the data is proprietary (eg Meta’s stash of FB comments) then I am satisfied to be told it’s private and I can’t see it. If, however, the works were public then give me a URL if it’s live or a cached copy if it isn’t.