Yea you don't know anything about LLM architectures. They often change with each model release.

You also aren't aware that there's more to it than "LLM architecture". And you're rather confident despite your lack of knowledge.

You're like the old LLMs before ChatGPT was released that were kinda neat, but usually wrong and overconfident about it.

It's still attention and next-token-prediction and nothing else.

The only new innovation is MoE, something that's used to optimize local models and not for the "SOTA" cloud offerings you're so fond of.

You no listen. Me give up. Go learn on fruit phone.

LLMs are literally next token prediction engines and nothing else.

Diffusion for text is not even an academic toy at this point and will likely never be a real thing.