Masked language modeling has been compared loosely to text diffusion [1], so the paper's title claim may be loosely true in some sense even if it's misleading.

[1] https://nathan.rs/posts/roberta-diffusion/