its called Virtual Try On (VTO) and there are plenty of models going there for static gfx, it is very reasonable to expect soon emerge those for video VTO.

Accurate virtual try on however is quite difficult, and users will quickly learn to distrust platforms that just generate something that"looks right".

You can prompt with a normal size 8 dress and "kim jungle un wearing a dress" and it will show you something that doesn't help you understand whether that dress would fit or not. You can ask for a tube dress and it will usually give him a big bust to hold it up. It's not useful for the purpose of visualing fit.

It will definitely be used for such just like image models already are for cheap tenu clothes, and our onions shopping experience will get worse.

Maybe this needs purpose built models like vibe-net or maybe you cab train a general purpose model to do it, but if they were spending the effort necessary to do so they'd be calling it out.