Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes.
I worked on it for a more specialized task (query rewriting). It’s blazing fast.
A lot of inference code is set up for autoregressive decoding now. Diffusion is less mature. Not sure if Ollama or llama cpp support it.
Did you publish anything you could link wrt. query rewriting?
How was the quality?
Quality was about the same. I will say it was a pain to train since it isn’t as popular and there isn’t out of the box support.
Interesting, thanks! That's pretty cool though!
Based on my experience running diffusion image models I really hope this isn't going to take over anytime soon. Parallel decoding may be great if you have a nice parallel gpu or npu but is dog slow for cpus
Because diffusion models have a substantially different refining process, most current software isn't built to support it. So I've also been struggling to find a way to play with these models on my machine. I might see if I can cook something up myself before someone else does...