Not necessarily. "As you may see, this is a Chinese lady. You have seen a number of Chinese ladies in your training set. Imagine the face of this lady so that it won't contradict the fragment visible on the image with the snowflake". (Damn, it's a pseudocode prompt.)

yes, so a stereotypical image. my point is best illustrated if you look at all of the photos of the woman.

Even if you provide another image (which you totally can btw) the model is still generalizing predictions enough that you can say it's just making a strong guess about what is concealed.

I guess my main point is "this is where you draw the line? at a mostly accurate reconstruction of a partial of someone's face?" this was science fiction a few years ago. Training the model to accept two images (which it can, just not for explicit purposes of reconstructing (although it learns that too )) seems like a very task-specific, downstream way to handle this issue. This field is now about robust, general ways to emerge intelligent behavior not task specific models.

is it mostly accurate though? how would you know? suppose you had an asian woman whose face is entirely covered with snow.

sure you could tell AI to remove the snow and some face will be revealed, but who is to say it's accurate? that's why traditionally you have a reference input.

> sure you could tell AI to remove the snow and some face will be revealed, but who is to say it's accurate? that's why traditionally you have a reference input.

As I stated a few times, the model HAS SUPPORT FOR MULTIPLE IMAGES. The article here doesn't try your very specific reference-image-benchmark but that doesn't mean you can't do it yourself - and it also doesn't imply there's anything wrong with the article or BFL - they're merely presenting a common usecase - not defining how the model should be used.

What's the traditional workflow? I haven't seen that done before, but it's something I'd like to try. Could supply the "wrong" reference too, to get something specific.