I worked on it for a more specialized task (query rewriting). It’s blazing fast.
A lot of inference code is set up for autoregressive decoding now. Diffusion is less mature. Not sure if Ollama or llama cpp support it.
I worked on it for a more specialized task (query rewriting). It’s blazing fast.
A lot of inference code is set up for autoregressive decoding now. Diffusion is less mature. Not sure if Ollama or llama cpp support it.
Did you publish anything you could link wrt. query rewriting?
How was the quality?
Quality was about the same. I will say it was a pain to train since it isn’t as popular and there isn’t out of the box support.
Interesting, thanks! That's pretty cool though!