Hacker News

I don't think you're really getting the point I'm trying to make. Everyone training llms regularly cares about serving users at scale and quality per compute invested. It's not just about OpenAI or Anthropic or Google. Qwen, Deepseek, Moonshot, whatever. They all care about it very much and basically can't afford to take a step back in those areas.

Since training models is currently a very expensive procedure, diffusion llms are destined to be relegated to the occasional research artifact at best. As things stand, making a serious commitment to them is basically the equivalent of throwing money into a fire pit and things are expensive enough as is.

Alternate Architectures that do a much better job matching transformers in quality have basically gone nowhere but you expect one that is basically worse in every way the labs care about won't ? I'm not trying to 'dismiss' dllms. I'm interested in them for the same reason you are. I'm just stating the factors at play plainly.