Hacker News

I'm particularly curious to know how this plays out, and I seriously hope that more labs focus on diffusion models for text usage.

My immediate thought - this performs slightly worse than the autoregressive gemma equivalent, but it may also let me functionally run better models in diffusion variants.

Ex - I can run 70b-120b autoregressive models locally right now, but I get ~5-15t/s, which just isn't fast enough for serious work.

Which caps me down in the 20-36b models (ex - gemma4) where I can get 100+t/s on the same hardware.

So the question becomes - does the quality drop from a diffusion model outweigh the quality bump from using a larger model?

Because if not... sounds like diffusion models have a lot of space to thrive.

---

Sadly - if they can't be hosted profitably, I question whether this space will actually be explored.