It's being explored right now for speculative decoding in the local-LLM space, which I think is quite interesting as a use-case
https://www.emergentmind.com/topics/dflash-block-diffusion-f...
It's being explored right now for speculative decoding in the local-LLM space, which I think is quite interesting as a use-case
https://www.emergentmind.com/topics/dflash-block-diffusion-f...
DFlash immediately came to my mind.
There are several Mac implementations of it that show > 2x faster Qwen3.5 already.