The merge operation in the later Chomsky modern linguistics program is similar in a lot of ways to transformer's softmax merging of representations to the next layer.

There's also still a lot to his arguments that we are much more sample efficient. And it isn't like monkies only learn language at a gpt-2 level, bigger brains take us to gpt-8 or whatever. There's a step change where they don't really pick things up linguistically at all and we do. But with a lot more data than we ever get, LLMs seem to distill some of the broad mechanisms what may be our innate ability, though still seems to have a large learned component in us.