> Publishing a crawl, or the URL's, under CC-0, CC-by, BSD, or Apache would make them usable without restrictions or any further legal analyses.

This isn't true, and I can't imagine that any lawyer would agree with this statement. CCF does not have rights ownership of any of the bytes of our crawl, so we cannot grant you any rights for the bytes in our crawl. Nothing that we could say could have any relationship to this legal issue.

It's confusing to me that you say this. Your own organization claims in the Terms of Service that it has rights over the crawls, even restricting how they are used. Now, you are telling me you believe you have none or no lawyer would consider this. If so, why is "Crawled Content" and restrictions on its use in your terms of service?

Very simply, if what you say is true, then you need to change your Terms to reflect that. You have two options:

1. Take crawled content out of the Terms of Service. Put a permissive license on the crawls.

2. Modify your Terms to say "crawled content" can be used for any purpose and distributed free with no restrictions. You currently impose extra restrictions, though.

That's contract law maybe with copyright elements in it. Yet, you also appear to believe your crawls aren't copyrightable. That's a huge unknown because collections are copyrightable when sufficient creativity is put into them:

https://en.m.wikipedia.org/wiki/Copyright_in_compilation

Many collections claim a copyright or have a permissive license for this reason. Again, simply saying your crawls and URL databases are permissively licensed would solve that problem. It takes just one edit on a few, web pages.

If crawls and DB's are truly without restrictions, please put a permissive license on their respective pages. Also, please change your terms to put no restrictions on Crawled Content. Instead, it should say something like it's free to use and distribute with no warranty or liability on you. The usual stuff.

I'll emphasize again that a permissively-licensed list of all URL's you've crawled is one of the most valuable changes you could make.

You made me sad that I attempted to reply.

Edit: spelling

You told me you made no legal claims on your crawled content. You implied you wanted it to be free for all uses.

I linked to your Terms of Service which claims control of "Crawled Content" with restrictions. I asked you to remove those parts or chnage them to BSD, etc for full permissions.

You denied any legal claims existed despite your Terms and collective copyright having legal claims. I explained how you can fix that by changing your terms and download page to be permissive.

Now, you are sad that you attempted to reply? Shouldn't you be happy that a Common Crawl fan who sees great value in your work warned you about restrictive Terms and unclear licensing? What's sad about that?

I am very grateful for work that your organization does. I'd like to promote it for many public-benefit uses, from machine learning to Google alternatives. I can't do that if Terms are restrictive and licensing is unclear. Please fix it so your supporters can tell potential users that the crawls and DB's themselves are zero risk on your end.

[deleted]