Can it find broken UI?
Human can find and report broken UI easily by using common sense.
Even though it is simple for human. Computer has no common sense and I am a machine learning expert. I tried and mostly failed to build a broken UI detector in my previous company. They had automated plugin upgradable process. That periodically broke UI.
I tried to detect it my taking long screenshot, and you could select a image as working version, then later finding diff between 2 images. I kind of worked but not satisfactory.
The agents can definitely detect when something is off, given they're using VLMs. They don't necessarily compare it to previous versions, rather they have opinionated takes on whether something looks broken / off. So - yes!