Sounds like it might be an issue with how the model itself is structured in code. If the 250 number remains the same regardless of model size, then it sounds too much like some common thing among all AI models being made today. GGML? PyTorch? Transformers? I think the issue lies in that area.

Isn't this just a desirable property of LLMs? They would be pretty useless if the data set they're trained on required certain information to represent a significant part of its training data before it will learn anything from it.