The term LLM is confusing your point because VLMs belong to the same bin according to Yann.

Using the term autoregressive models instead might help.

Diffusion models are not autoregressive but have the same limitations