They do, don't they? I think OpenAI uses libgen.

Meta managed to get into a private ebook torrent tracker called Bibliotik a few years ago to use for training Llama and the resulting publicity essentially killed the tracker.