This can definitely be done. There are approaches that turn the decoder part of the autoencoder into another diffusion model. The drawback is that's much more expensive computationally. We think there's still a lot of room for better quality on the AE side and can't wait to show our improvements.