I'm no expert (just a monkey... ;) ), but isn't Diffusion supposed to generate ALL of the output at once? From their diagram, it looks like their I-LDM model seems to use previously generated context to generate the next tokens (or blocks).

Block auto regressive generation can give you big speedups.

Consider that outputting two tokens at a time will be a (2-epsilon)x speedup over running one token at a time. As your block size increases, you quickly get to fast enough that it doesn't matter sooooo much whether you're doing blocks or actual all-at-once generation. What matters, then, is there quality trade-off for moving to block-mode output. And here it sounds like they've minimized that trade-off.

can it go back and use future blocks as context? Thats what i'm most interested in here - fixing line 2 because of a change/discovery we made in the process of writing line 122. I think that problem is a big part of the narrowsightedness of current coding models

Exactly. The current (streaming) way means that once it makes a decision, it's stuck with it. For example, variable naming: once it names it something, it's stuck using that name in the future. Where as a human would just go back and change the name.

Maybe "thinking" will fix this aspect, but I see it as a serious shortcoming.