It's still attention and next-token-prediction and nothing else.
The only new innovation is MoE, something that's used to optimize local models and not for the "SOTA" cloud offerings you're so fond of.
It's still attention and next-token-prediction and nothing else.
The only new innovation is MoE, something that's used to optimize local models and not for the "SOTA" cloud offerings you're so fond of.
You no listen. Me give up. Go learn on fruit phone.
LLMs are literally next token prediction engines and nothing else.
Diffusion for text is not even an academic toy at this point and will likely never be a real thing.