> This one is bizarre, if true (I'm not convinced it is).

> The entire purpose of the attention mechanism in the transformer architecture is to build this representation, in many layers (conceptually: in many layers of abstraction).

I think this is really about a hidden (i.e. not readily communicated) difference in what the word "meaning" means to different people.

Could be, by "meaning" I mean (heh) that transformers are able to distinguish tokens (and prompts) in a consequential ("causal") way, and that they do so at various levels of detail ("abstractions").

I think that's the usual understanding of how transformer architectures work, at the level of math.