>There are grammar rules, they are just very weak because the structure of human language is generally quite weak. When presented with languages which have strong consistent grammars the weights are very easily interpretable as a grammar: https://arxiv.org/abs/2201.02177
That paper did not train the models on 'a language with strong consistent grammars'. Mathematical Operation tables are not a language. Grammar itself is a post-hoc rationalization and there's no evidence LLMs follow 'grammar rules' anymore than the brain follows grammar rules. Of Course, that's not to say transformers can't learn simple rules if the dataset calls for it.
> Mathematical Operation tables are not a language.
Not a natural language, but they are certainly a language as in a symbolic representation of information.
A language is a set of sentences.
A sentence is a finite sequence of symbols drawn from an alphabet.
In this sense, mathematical operation tables are absolutely a language. As are natural languages.
>A language is a set of sentences. A sentence is a finite sequence of symbols drawn from an alphabet.
A language is a structured system of communication used to express arbitrary ideas between multiple parties. Math operation tables do not, and cannot, do that on their own.
That distinction matters here because we are talking about what properties the model is expected to learn. English and operation tables are fundamentally different objects, so it is not surprising that a model learns different kinds of structure from them.