Tried a bit, and while it is very impressive for 0.2B model it would be very hard to convince me that this matches with 10B models. It did work reasonably well with natural images but inpainted regions were visibly smoother than surroundings, and performed very badly on novel objects. It is also limited to 512x512 output, which limits its practical usefulness.
Do you think the provided examples are representative of its performance, or do you think they were cherry picked?
Given its limited output dimension it's hard to tell. I haven't exactly tested fine-tuned variants but I think they would work well under certain situations. After all, some (possibly cherry-picked) examples still exhibit similar problems when you inspect them in detail.