Isn't this just a desirable property of LLMs? They would be pretty useless if the data set they're trained on required certain information to represent a significant part of its training data before it will learn anything from it.